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Abstract 

Schwaxz  alternating  method  ( is  an  old  mathematical  technique  dating  from 
1869.  It  was  commonly  believed  that  was  a  useful  tool  for  theoretical  anal¬ 
ysis  but  not  a  very  practical  approach  for  computations.  The  earlier  experiences 

r 

showed  that  converged  slowly.  In  this  thesis,  is  reexamined  and  general¬ 
ized.  The  governing  factors  of  convergence  of  fyM  are  explored  through  the  analysis 
for  the  model  problem.  Based  on  this  knowledge,  many  acceleration  schemes  can 
be  combined  with  fyM  to  yield  a  new  type  of  iterative  method  for  large  sparse  ma¬ 
trix  problems.  In  particular,  when  these  techniques  are  applied  to  the  solution  of 
the  model  problem,  an  optimal  complexity  can  be  achieved.  Some  generalizations 
of  fyM,  namely  Schwarz  splittings  (%),  are  presented  here.  For  many  important 
applications,  such  as  performing  parallel  computations  in  a  non-shared  memory 
environment,  using  composite  grids  and  also  applying  fast  solvers  in  an  irregular 
region,  %’s  are  found  to  be  useful  techniques. 

In  order  to  identify  the  types  of  problems  for  which  %’s  are  most  suitable,  a 
new  structure  for  the  linear  operators  called  template  operators  has  been  developed. 
Some  decay  results  for  the  elements  of  the  inverses  of  sparse  operators  axe  given. 
These  results  provide  a  theoretical  basis  for  determining  when  these  %  techniques 
can  be  used  successfully. 
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This  dissertation  is  a  reexamination  and  generalization  of  a  very  old  mathematical 
technique  -Schwarz  Alternating  Method  It  was  commonly  believed  that 

was  a  useful  tool  for  theoretical  analysis  but  not  a  very  practical  approach  for  com¬ 
putations.  Some  preliminary  experiments  had  indicated  that  was  not  promising 
because  it  converged  too  slowly.  There  was  little  knowledge  of  the  factors  governing 
the  convergence  of  this  method.  In  this  study,  an  analysis  of  for  the  model 
problem  for  elliptic  partial  differential  equations  is  presented.  The  convergence  fac¬ 
tor  of  is  found  to  be  a  function  of  many  components.  Based  on  the  analysis, 
many  acceleration  schemes  can  be  combined  with  this  idea  to  yield  a  competitive 
new  type  of  iterative  method  for  large  sparse  matrix  problems.  Generalizations  of 
are  als<  introduced  in  this  thesis.  We  show  that  fyM  is  not  only  suitable  to 
solve  elliptic  partial  differential  equations,  it  is  also  a  good  computational  model 
for  many  other  important  applications.  Particularly,  a  new  structural  view  of  linear 
operators  is  presented  which  provides  a  useful  tool  for  analyzing  the  behavior  of 
sparse  operators. 

Here  the  original  version  of  is  first  introduced.  Following  this  description  the 
motivations  of  this  study  are  discussed.  Then  a  brief  historical  survey  is  presented. 
At  the  end  of  this  introduction,  the  organization  of  the  rest  of  this  dissertation  is 
outlined. 

In  the  last  century,  Schwarz  [Sch69]  found  that,  for  a  region  consisting  of  the 
union  of  two  rectangular  regions  or  disks,  he  could  construct  a  sequence  of  solutions 
of  the  Laplace  equation  on  subregions  which  would  converge  to  the  solution  of  the 
Laplace  equation  on  the  union.  His  method  is  now  called  fyM-  The  description  of 
a  simple  version  of  is  as  follows: 
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Figure  1.1:  Two  overlapping  subregions 

Consider  the  Dirichlet  problem  for  an  elliptic  operator  L  1 

f  I(u)  =/,  X6fi, 

1  u  |r„  =  x  e  To 

where  11  is  a  bounded  region  in  ^-dimensional  space,  To  is  the  boundary  of  fl, 
x  =  {xj, x2,  •  •  • , Xk}  is  the  independent  variable. 

Schwarz  split  the  solution  domain  fl  into  two  overlapping  subdomains  fli  and 
112.  Let  fl12  =  Cli  fl  flj  ^  0,  To,,  To,,  Tn,,  denote  the  boundaries  of  flj,  fl2  and  fl12 
respectively  (see  fig.  1.1).  Let 

r0l  =  rl  +  r\, 

Fqj  =  r2  +  r2. 

where 


Tj  =  rnflTn,, 

1  VVe  assume  that  the  solution  of  this  problem  exists  and  is  unique. 


From  this  splitting  we  can  formulate  two  coupled  problems 

f  L(u\)  =  /,  x  G  fix , 

|  .  f  0,  x  6  Ti, 

l  u„  I6P„ 

I(u2)  =  /,  x  G 

'  ,  _  /  0,  x  g  r5, 

“2  lr°’  {  Ui,  X  G  r2. 


It  is  clear  that  it,  the  solution  of  (1.1),  is  the  solution  of  (1.2)  and  (1.3).  We 
may  also  easily  show  that: 


1*1 

-  «2, 

x  G  fll2, 

«1 

= 

x  G  ill, 

U2 

=  «, 

X  G  ^2- 

Thus,  the  problem  (1.1)  is  equivalent  to  the  pair  of  problems  (1.2)  and  (1.3).  Since 
there  are  unknowns  which  are  coupled  in  the  boundary  conditions,  we  cannot  solve 
the  two  problems  independently.  But  giving  an  initial  guess  u  |r»  =  0o,  we  will  be 
able  to  construct  a  sequence  {uj'\u2*}  as  follows: 


£(»!01)  =/. 
.<»>  I_  _  / 


Ir* 


_  f  0>  X  G  Tj, 
1  00,  X  G  n, 


x  G  fti, 
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4° 
<  lr0l 


L(u i,)) 

(i) 


x  €  T\, 
u^0,  x  €  r;, 


x  €  fti. 


(1-6) 


*'  =  1,2,  ■ 


It  can  now  be  shown  that  the  sequence  {u^,  u^}  will  converge  to  the  solutions 
{ui,u2}  of  (1-2)  and  (1.3)  under  certain  conditions  (see  section  2.3).  Then,  from 
the  solution  of  (1.2)  and  (1.3),  the  solution  of  (1.1)  can  be  constructed. 

Here  we  have  just  described  the  simplest  version  of  QM.  Unlike  some  other 
techniques  which  usually  are  precise  procedures  for  solving  problems,  basically 
gives  us  only  a  philosophy  for  solving  a  problem.  The  freedom  inherent  in  fyM 
provides  great  opportunity  to  incorporate  many  other  techniques  in  order  to  obtain 
good  performance. 


•  Freedom  in  the  geometrical  shapes  of  the  subproblems.  This  freedom  makes 
it  possible  to  tailor  the  subregions  to  meet  the  requirements  imposed  by  fast 
solvers  or  by  grids. 

•  Freedom  in  the  solution  techniques  for  subproblems.  We  are  able  to  choose 
different  solution  techniques  for  different  subproblems.  It  is  also  possible  to  use 
different  ways  to  obtain  the  solution  of  the  same  subproblem  in  the  different 
stages  of  the  computation,  allowing  us  to  use  an  optimal  approach  at  any 
particular  moment  and  in  any  particular  location.  This  is  a  unique  feature  of 

•  Freedom  in  the  numerical  model  for  each  subproblem.  Special  boundary 
shapes  or  local  behavior  of  the  solution  need  a  special  treatment  in  the  mod¬ 
eling.  The  decoupled  subproblems  allow  us  to  localize  the  special  treatment 
to  the  place  where  it  is  needed.  Composite  grids  are  a  good  example  of  this 
case. 


•  Freedom  in  the  number  of  subproblems.  This  freedom  will  permit  us  to  adapt 
this  algorithm  to  different  degrees  of  parallelism. 


•  Freedom  in  the  coupling  pattern. 

-  The  type  of  boundary  value  for  these  artificial  boundaries  in  the  decom¬ 
position. 

-  Overlapping  area. 

-  More  than  one  partitioning. 

Later  we  will  show  that  proper  use  of  these  freedoms  can  yield  an  efficient 
algorithm. 

A  particularly  interesting  application  of  SfiM  is  for  parallel  computations. 
not  only  provides  parallelism  in  the  algorithm.  Another  advantage  for  an  efficient 
implementation  of  is  the  local  communication  pattern  and  the  hiding  of  global 
information  exchange.  For  any  current  state-of-the-art  parallel  computer  the  cost 
of  the  communication  is  always  a  killer  of  efficiency.  The  relatively  cheap  cost 
of  the  hardware  provides  a  possibility  of  using  a  large  number  of  processors  to 
solve  a  big  problem.  Unfortunately,  in  the  near  future  the  communication  cost  will 
prevent  us  from  fully  taking  advantage  of  fine  grain  parallelism  in  a  general  purpose 
computer  architecture.  The  low  ratio  of  communication  verse  computation,  the 
coarser  granularity  and  the  flexibilities  we  mentioned  above  make  fyM  an  attractive 
candidate  as  a  parallel  algorithm.  This  is  one  of  the  major  motivations  for  our 
study. 

Since  this  alternating  method  appeared,  many  application  areas  have  been  found. 
Here  we  present  a  brief  historical  survey  of  the  literature: 

In  1869  Schwarz  [Sch69]  first  developed  a  method  he  called  an  alternating 
method  to  prove  the  existence  of  the  solution  of  the  Dirichlet  problem  for  the  Laplace 
equation  on  a  union  of  two  overlapping  areas.  Soon  Neumann  [Neu70]  observed  that 
a  similar  idea  could  be  applied  to  the  solution  of  the  Dirichlet  problem  in  a  region 
that  is  the  intersection  of  two  other  regions  overlapping  one  another.  Later  Poincare 
[Poi90]  developed  his  methodc  de  balayage ,  which  is  similar  to  Schwarz’s  method, 
Poincare  was  also  concerned  with  existence  proofs  rather  than  computation. 

During  the  30’s  many  Russian  mathematicians  applied  Schwarz’s  method  to 
problems  in  elastostatics.  They  treated  the  solution  process  of  as  a  search 
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for  the  minimum  of  a  variational  problem.  This  new  way  of  thinking  provided 
possibilities  for  enlarging  the  applicable  areas.  Gorgidze  [Gor34a],  [Gor34b]  applied 

to  a  plane  problem  in  the  theory  of  elasticity.  Almost  at  the  same  time,  Mikhlin 
[Mik34]  generalized  this  idea  to  a  biharmonic  problem.  He  proved  the  convergence 
of  44 M  to  the  solution  of  the  second  elastostatic  boundary  value  problem.  A  more 
general  proof  of  this  method  for  the  second  boundary  value  problems  of  elasticity  in 
three  dimensions  was  sketched  out  by  Sobolev[Sob36].  He  reduced  the  consideration 
of  convergence  of  the  sequence  of  approximations  to  a  study  of  convergence  to  the 
minimum  of  the  integral  of  strain  energy. 

In  the  early  50’s,  Kantorovich  and  Krylov  [KK58]  gave  a  set  of  sufficient  conditions2 
which  guarantee  the  convergence  of  <§4^/.  These  conditions  encompass  most  of  the 
areas  to  which  can  be  applied. 

After  the  60’s  people  started  to  apply  fyM  to  numerical  computations  rather 
than  to  existence  proofs  or  theoretical  analysis.  Some  new  algorithms  such  as  ADI 
methods  or  Fourier  series  methods  were  the  state  of  the  art  at  that  time,  but  they 
could  only  be  applied  to  rectangular  regions.  was  a  very  natural  way  of  applying 
these  methods  to  a  union  of  rectangular  regions. 

D’Jakonov  [DJa62]  derived  some  work  estimates  for  solving  Poisson’s  equation 
to  a  given  precision  on  overlapping  rectangular  regions  by  <§4/1/.  The  rectangular 
solutions  axe  by  the  alternating-direction  implicit  method,  or  a  similar  method  of 
D’Jakonov’s,  applied  to  the  5-point  difference  approximation. 

Werner  [Wer60],  [Wer63]  considered  application  of  <§4 ¥  to  any  linear  second- 
order  elliptic  P.D.E.  with  boundary  conditions  of  the  third  type.  He  proved  the 
existence  of  a  continuous  solution  and  gave  error  bounds  for  a  solution  which  satisfies 
the  differential  equation,  but  only  approximates  the  boundary  data.  He  presented 
some  numerical  results  for  the  Laplace  equation  on  an  L-shaped  region  with  mixed 
boundary  conditions.  The  rectangular  solutions  are  expressed  as  a  double  finite 
Fourier  series. 

We  have  mentioned  the  result  by  Miller[Mil65].  In  the  same  paper  he  also  gives 
work  estimates  for  several  cases.  Fairweather  and  Mitchell  [FM66]  applied  to 

JWe  will  present  them  in  section  2.3  of  the  next  chapter. 


a  9-point  difference  approximation  on  an  T-shaped  region.  They  used  a  modified 
ADI  method  to  solve  the  subdomain  problems. 

Dupont [Dup67]  generalized  their  idea  to  the  equation  V(aVu)  =  p ,  and  derived 
work  estimates  on  overlapping  rectangular  regions. 

Stoutemyer[Sto72]  applied  and  Neumann’s  variant  to  the  Laplace  equation 
on  the  union  and  intersection  of  two  disks. 

As  we  mentioned  earlier,  applications  of  fyM  to  the  composite  mesh  method  have 
attracted  people’s  attention  for  some  time.  Volkov  [V0I68]  first  presented  a  second 
order  composite  mesh  method  for  the  Dirichlet  problem  for  the  Laplace  equation; 
he  also  used  to  solve  the  system  of  linear  equations. 

Later,  Starius  [Sta77]  generalized  this  idea  to  linear  second  order  elliptic  equa¬ 
tions. 

When  computer  technology  advanced  to  parallel  processing,  the  inherent  paral¬ 
lelism  in  this  algorithm  obtained  new  appeal.  Kang[KCSQ85]  extended  the  varia¬ 
tional  form  of  to  general  second  order  elliptic  P.D.E.s,  and  tried  to  apply  it  to 
parallel  computations. 

Glowinski,  Dinh  and  Periaux  [DGP80]  [GDP80]  formulated  a  conjugate  gradient 
variant  of  fyM.  Essentially,  they  reduced  the  problem  to  a  minimization  problem 
on  the  intersection  of  two  overlapping  regions. 

Rodrigue[RS84a],  [RS84b]  [Rod86],  and  [RS85]  recast  in  terms  of  numerical 
linear  algebra  so  that  classical  techniques  of  acceleration  could  be  applied.  A  Jacobi 
splitting  of  the  modified  matrix  problem  was  studied  in  these  papers. 

Analyses  and  experiments  have  shown  that  the  convergence  rate  of  the  plain 
can  be  further  improved.  Many  authors  have  independently  found  that  SOR 
acceleration  of  fyM  works  very  efficiently.  Oliger,  Skamarock,  Tang  [OST86]  also 
noticed  that  the  sensitivity  of  the  relaxation  parameter  is  related  to  the  overlap. 
Theoretical  estimates  of  the  convergence  rate  and  choice  of  the  best  relaxation 
parameter  for  the  model  problem  are  given.  In  the  same  paper  we  mentioned 
above,  Kang  also  proved  the  convergence  of  the  SOR  acceleration  for  the  finite 
element  method[KCSQ85j.  Meier  [Mei86]  had  also  proposed  a  parallel  SOR  variant 
of  fytf. 


8 


Chapter  1.  Introduction 


In  the  next  chapter,  an  analysis  of  for  the  continuous  model  problem  in  the 
k -dimensional  case  is  presented,  a  few  of  the  important  factors  which  govern  the 
convergence  of  axe  explored  and  then  a  generalization  —  multi-color  —  and 
its  convergence  proof  are  shown.  This  generalization  is  mainly  motivated  by  parallel 
computation.  Because  as  we  mentioned  before,  can  be  viewed  as  a  general 
methodology  for  solving  a  problem,  sets  of  sufficient  conditions  for  convergence  of 
■&AT  when  applied  to  functional  equations  are  then  presented. 

Following  Rodrigue  and  Simon’s  idea,  we  propose  a  linear  algebra  analog  of 
this  model  in  Chapter  3.  The  original  matrix  is  modified  into  an  equivalent,  loosely 
coupled  matrix  which  is  called  the  Schwarz  Enhanced  Matrix  (or  SeM).  Some  equiv¬ 
alence  theorems  and  the  applicable  matrices  of  the  Spfd  are  discussed.  A  Schwarz 
splitting  (•%)  of  the  is  then  defined.  If  the  original  matrix  is  an  M- -matrix,  then 
Ss  is  a  convergent  splitting. 

In  Chapters  2  and  3,  the  analysis  concentrates  mainly  on  convergence  and  gen¬ 
eralizations.  Another  important  issue  is  the  characterization  of  problems  for  which 
SAM  is  most  suitable.  One  particular  phenomenon,  exponential  decay  of  the  in¬ 
verse  of  a  sparse  operator,  which  contributes  to  the  success  of  is  investigated 
in  Chapter  4.  We  found  that  matrices  were  not  good  structures  for  the  study  of 
this  problem.  New  data  structures,  template  vector  and  template  operator ,  for  a 
linear  operator  in  a  finite  dimensional  space  are  introduced.  Some  bounds  for  the 
norm  of  the  wavefronts  are  shown.  Particularly,  a  sufficient  condition  which  yields 
exponential  decay  of  the  inverse  is  given.  These  results  provide  some  guidance  for 
a  successful  application  of  SfiM. 

Detailed  analyses  of  the  application  of  §5  to  the  model  problem  in  one  and 
higher  dimensional  cases  are  presented  in  Chapter  5.  The  spectral  radii  of  the 
Jacobi  iterative  matrices  of  $5  for  these  cases  are  derived.  Similar  results  can  also 
be  derived  for  higher  order  difference  schemes  or  finite  element  methods.  We  show 
that  if  the  overlapping  area  is  a  constant  fraction  of  the  subproblems,  this  algorithm 
has  an  optimal  order  of  complexity.  This  means  that  the  work  needed  to  obtain  an 
approximate  solution  which  is  accurate  to  truncation  error  is  proportional  to  the 
number  of  unknowns. 


I 
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In  Chapter  6,  based  on  the  analysis  in  Chapter  5,  we  apply  several  acceleration 
schemes  to  the  different  hierarchical  levels  of  this  algorithm.  One  consequence  of  the 
analysis  of  •?>  in  the  last  chapter  is  that  many  classical  techniques  of  acceleration, 
especially  SOR  (Successive  Over  Relaxation)  acceleration,  can  be  applied  to  this 
model.  For  the  model  problem,  the  classical  analysis  of  SOR  and  many  other  accel¬ 
erations,  for  instance  Chebychev  acceleration,  cam  be  applied  to  this  case  without 
any  difficulty.  Theoretical  analyses  and  experiments  show  that  the  improvement 
in  the  performance  is  significant.  The  choice  of  the  relaxation  parameter  in  SOR 
acceleration  is  the  only  global  information  exchange  in  the  algorithm.  But  if  a  lo¬ 
cal  relaxation  method  is  used,  this  globed  communication  can  be  eliminated.  As 
we  show  in  Chapter  4,  the  mesh  size  is  only  involved  in  a  higher  order  term  of 
the  convergence  rate.  Also,  the  low  frequency  errors  dominate  the  convergence.  A 
multi-level  grid  strategy  is  appropriate  here.  In  the  last  section  we  present  some 
other  parallel  implementation  strategies  to  make  this  algorithm  a  powerful  parallel 
algorithm. 


Chapter  2 


Convergence  Analyses  and  Multi-Color  SAM 


In  this  chapter  an  application  of  to  the  continuous  model  problem  for  elliptic 
partial  differential  equations  in  the  fc-dimensional  case  is  given  in  detail.  Through 
the  analysis  of  this  example  many  important  factors  which  affect  the  convergence 
rate  of  ^  are  disclosed.  This  analysis  provides  useful  guidance  for  an  efficient 
implementation  of  ^  Motivated  by  the  parallel  computations,  a  new  parallel  im¬ 
plementation  of  multi-color  and  its  convergence  proof  are  presented  in  Section 
2.2.  Finally,  two  sets  of  very  general  sufficient  conditions  for  the  convergence  of 
in  the  literature  are  listed. 

2.1  An  Analysis  of  SAM 

In  this  section  we  will  apply  fyM  to  the  model  problem  on  a  uniform  cube  in  fc- 
dimensional  space.  Through  the  analysis  of  the  solution  process  in  this  problem 
we  can  demonstrate  many  important  characteristics  of  this  method  .  In  later  chap¬ 
ters  we  will  take  advantage  of  these  features  to  make  a  competitive  iterative 
algorithm. 

Consider  the  Dirichlet  problem: 

{Au(x  i,---, z*)  =  /(x!,---,x*), 

wlrn  *$(* i»- ••»**)• 

where  fl  is  a  fc-dimensional  uniform  cube  for  which  the  lengths  of  the  edges  are  all 
equal  to  a.  The  restriction  to  uniform  length  is  only  for  convenience  of  discussion; 
generalization  to  different  lengths  in  different  coordinates  is  straightforward. 

Let  us  decompose  this  cube  into  two1  overlapping  subregions.  Suppose  the 
1  We  can  generalize  this  analysis  to  the  case  which  has  any  finite  number  of  subregions. 
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2.1. 


B  F  D  H 


Figure  2.1:  Two  overlapping  rectangular  grids 

overlapping  direction  is  X\.  Figure  2.1  shows  a  two  dimensional  case2.  Denote  c 
as  the  width  of  the  overlap,  and  b  as  the  length  of  the  subcube  in  the  overlapping 
direction  Xi.  Here  Tj  is  CD  and  T,  is  EF.  Apply  the  algorithm  1.4-1. 6  to  these 
two  overlapping  regions;  the  sequence  will  eventually  converge  to  the  solution  on 
the  uniform  cube.  Denote  as  the  i-th  approximate  solutions  in  fli  and  fl2l 

U!  and  u2  the  true  solutions  in  the  two  subregions.  Let 


4° 

— f» 

-  **i, 

4" 
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In  • 

3If  we  may  imagine  the  lines  AB,  CD,  EF  and  HG  are  k  —  1-dimensional  uniform  cubes  Then 
it  can  also  represent  ^-dimensional  case. 
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Then  the  error  functions  s['^  and  will  satisfy  the  following  relations: 


a4(0»  =o 


c(°)  |_ 

To, 

a4°  =o 


■i 


,(°) 

ea 


x  €  Ti 

x  6  r; 


xeTj 
>  xer; 


•“  <  -  j  :r 

Ae}0  =0 

-(*)  |  _  /  0  x  €  r» 

1  lrn«  "  \  e?  x  €  r; 


x  £  fli 


X  6  fl2 


x  €  fii 


(2.1) 


(2.2) 


(2.3) 


Now  we  are  able  to  analyze  the  convergence  process  by  Fourier  analysis.  Since 
the  error  functions  of  the  approximation  satisfy  the  Laplace  equation  and  have 
boundary  values  0  except  at  one  face  of  the  subcube,  by  expanding  the  boundary 
value  e-j)  at  T'  in  Fourier  series,  we  may  express  the  error  function  in  the  whole 
subcube  in  terms  of  the  coefficients  of  the  boundary  values.  Let 


.(o)  _  v  «(°)  t2X2ir 

-a  “  2-  a«3  •* 


sin 


sm 


Let 


••  ,*fc)  =  off..,*  sin 


i2x27r 


’  sm 


7T 


a  a 

*2>"*  »**  =  1.2,- 


and 


Then  we  have 


r(»2,  •••,**)  =  yj* I  +  •••  +  **• 


(0)  ^  sinhr(*2,---,**)^ 

£1  =  E  tttttt: - * **)• 


sinh  r(i2,  •  •  •  ,  **)£ 
The  boundary  value  at  r2  will  be 

(0)  ^  sinhr(*2,---,i*)ttrl 


sinhr(i2,-..,tfc)f 


■MM 
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By  using  the  same  reasoning,  we  have: 


,U> 

e2 


B 


sinh  r(:2, 


sinh  r(i2, 


,  2  , . 

-  *  •)  ^(l2, 


,  **)• 


Thus  the  amphfication  factors  axe: 


sinhr(t2,---,u)^yl 

sinh 

*2,  =  1,2,--- 


(2.4) 


It  is  obvious  that  the  amplification  factor  of  the  lowest  frequency  component 
will  dominate  the  convergence.  If  we  define  the  convergence  factor  as  the  ratio  of 
the  two  norms  of  consecutive  error  functions,  the  convergence  factor  of  this  method 
for  the  model  problem  is: 

sinh 

linh  ^6  • 

a 

From  2.4  and  2.5  we  can  observe  a  few  important  facts  of  First  the  over¬ 

lap  ratio  c/6  has  a  strong  influence  on  the  convergence  rate.  That  is:  when  the 
overlap  increases  the  convergence  factor  will  improve  exponentially.  Also,  the  ratio 
6/a  will  affect  the  convergence.  It  is  clear  that  we  should  avoid  overlapping  in  a 
direction  for  which  the  width  of  the  subregion  is  too  short  in  comparison  with  the 
other  directions.  Another  important  observation  is  that  the  amplification  factors 
exponentially  decay  when  the  frequencies  increase.  This  is  a  favorable  feature  for 
multilevel  grid  strategies.  We  can  start  the  computations  with  a  very  coarse  grid  to 
obtain  coarse  frequency  information,  then  reduce  the  grid  size  to  obtain  the  higher 
frequency  information.  Moreover,  2.4  tells  us  that  the  high  frequency  errors  do 
not  make  a  significant  contribution  to  the  error  inside  of  the  region.  So  we  might 
carry  on  the  communication  at  some  coarser  grid  level  in  order  to  reduce  the  com¬ 
munication  cost.  Another  important  feature  of  especially  propitious  for  large 
scale  computations,  is  that  the  higher  the  dimension  the  faster  the  convergence.  In 
later  chapters  we  will  elaborate  these  characteristics  of  SfiM  in  depth  and  apply  sev¬ 
eral  acceleration  schemes  simultaneously  to  this  model  to  construct  a  very  efficient 
algorithm. 
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Using  the  same  idea  and  notation  we  can  easily  analyze  the  case  of  applying 
Neumann  boundary  condition  on  CD  and  EF.  Now  equations  1.2  and  1.3  become  : 


=  /, 

“i  iTx 

=  0, 

im.  1—,, 

_  dui 

dn  Tj 

dn  1 

Cl 

<3 

=  /, 

“a  |r2 

= 

dui  | 

_  du\ 

dn 

dn  * 

x  G  Hi, 


z  € 


(2.6) 


(2.7) 


The  corresponding  equations  for  the  error  functions  are: 
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o> 

II 
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z  €  fii, 

40,lri  =o, 

Clr; 

(2-8) 

A^0  =0, 

z  G  fl2, 

4°  lr,  =  °* 

ac(,°  i  a*(1'-,) 
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(2.9) 

o 

ii 

o 

i  G  flx, 

sf’lr,  =o. 

9$'  i  a//1 

5n  1  r,  dn  ' 

(2.10) 

(2.11) 


Similarly,  we  may  expand  the  error  in  the  Neumann  boundary  condition  problem 
in  a  Fourier  series: 


e<°> 


IjXjTT 

sin - 

a 


•  sin 


ikXk* 

a 


The  solution  of  the  error  in  fli  is  as  follows: 
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By  the  same  reasoning,  we  know  that  the  convergence  factor  of  the  error  on  the 
artificial  boundaries  is: 

(2.12) 

It  is  not  difficult  to  see  that  the  Dirichlet  boundary  condition  on  these  artificial 
boundaries  works  better  than  the  Neumann  boundary  condition.  In  Chapter  6 
we  will  show  that  this  conclusion  is  also  true  for  discrete  cases.  There  are  also 
other  kinds  of  combination  of  different  boundary  conditions  can  be  imposed  on 
these  artificial  boundaries.  For  the  Laplace  operator,  we  have  shown  that  Dirichlet 
boundary  conditions  are  better.  This  conclusion  can  also  be  derived  from  the  decay 
rate  of  the  Green’s  function.  We  will  not  present  the  details  here.  We  could  not 
extend  this  analysis  to  obtain  a  general  conclusion.  I  conjecture  that  for  different 
problems  the  best  choice  of  the  type  of  boundary  conditions  may  vary.  It  is  a  very 
interesting  open  problem  for  future  research. 

2.2  Multi-Color  SAM  and  Its  Convergence 

The  was  originally  constructed  as  a  sequential  process  by  Schwarz  in  1S69. 
But  the  inherent  parallelism  in  this  idea  provides  many  possibilities  of  constructing 
some  highly  parallelized  implementations.  In  this  section  a  multi-color  f&M  for  the 
solution  of  a  second  order  linear  elliptic  PDE  is  presented.  The  convergence  proof 
of  this  method  is  also  given. 

A  simplest  parallel  implementation  of  is  two  color  or  red-black  fyM.  It  is  a 
natural  extension  of  the  red-black  SOR  algorithm.  The  basic  idea  of  red-black  SfiM 
is  as  follows:  construct  two  sets  of  partitions  of  the  solution  region3,  red  and  black, 
given  some  initial  guess  on  the  artificial  boundaries  in  the  red  set  of  subregions, 
solve  the  red  set  of  subproblems  independently.  Using  the  solution  of  the  red  set 
for  the  value  of  the  artificial  boundaries  in  the  black  set,  solve  the  subproblems  in 
the  black  set  independently  and  repeat  this  process.  If  the  partitioning  provides  a 
balanced  load  for  each  processor,  this  implementation  is  a  highly  parallel  algorithm. 
But  from  the  analysis  in  Section  2.1,  we  know  that  the  error  reduction  varies  from 

3They  have  to  cover  the  whole  solution  region. 
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different  positions  in  each  subproblem.  The  new  boundary  values  for  these  artificial 
boundaries  should  be  taken  from  fast  convergent  zones  of  a  partitioning  in  which 
this  artificial  boundary  resides.  If  we  only  have  two  sets  of  partitions,  there  is  only 
one  choice  for  each  point  on  these  artificial  boundaries.  The  best  partitioning  and 
choice  of  boundary  values  is  very  difficult  to  accomplish.  Instead,  we  may  plan  a 
few  sets  of  partitions  so  that  each  point  on  the  artificial  boundaries  belongs  to  a 
fast  convergent  zone  in  at  least  one  of  the  partitions.  With  more  than  two  sets  of 
partitions,  this  goal  is  easier  to  be  fulfilled.  The  same  idea  can  be  applied  to  the 
elliptic  operator  L  in  any  finite  dimensional  space.  This  implies  that  the  multi-color 
&W  can  also  be  used  to  solve  any  Unear  system  of  equations  which  has  a  positive 
definite  coefficient  matrix. 

To  simphfy  the  notation,  a  description  of  the  algorithm  for  a  two-dimensional 
problem  is  given  here.  The  extension  to  higher-dimensional  problems  is  straight¬ 
forward. 

Let  £2(ft)  be  a  Hilbert  space  with  respect  to  the  inner  product 

(u,u)  =  /  uvdCl, 

Jo 

and  the  norm 

II « 11=  v/,(u’u)’ 

where  Cl  is  a  bounded,  connected  open  set  in  R3.  Ci(ft)  denotes  the  space  of  real 
valued  continuously  differentiable  functions  on  ft,  where  Cl  ~  OUT  and  T  is  the 
boundary  of  ft.  Let 


a  =  (ai,ar2), 


Da  = 


d\o\ 

dxa'dyai ' 


|  a  |=  <*i  +  a2,  <*i,a2  >  0 


u(“)  =  Dau. 


Then  the  Sobolev  spaces  JI1(Cl)  and  axe  defined  as 


H'(Cl)  =  {u  |  D°u  6  L2(ft),0  <|  a  |<  1} 
HX<3(T)  =  tu  |  D°u  e  L2(T),0  <|  a  |<  1/2} 
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Particularly, 


Hm  =  {u  \(u  G  tfl(ft))U(u  |r=  0)} 

H'3(Cl)  ={u\(u€H\n))U(u\r=g)} 

Consider  the  boundary  value  problem 

(  L(u)  =  -[piur]r  -  \piUy\y  +  qu  =  /, 


u  lr  =  0, 


(2.13) 


where  pi,pi,q  G  Ci(fi),/  €  Li(Q.),g  G  It  is  well  known  that  this  problem 

(2.13)  is  equivalent  to  the  minimization  problem 


=  2an(u>«)  ~  (u), 


« €  hu  n, 


where 


^n(ti,v)  =  ^[piu*ur  +  +  ?uv]  dJ2, 

Gn(«)  =  [  fudQ. 

Jo 

Construct  a  sequence  of  partitions  of  fi: 


n.: 


n. :  - ■  .,«'*»} 


such  that 


1.  «|,)nftj,)  =  0,  if  ijtj. 

2.  Vx  6  3  i  and  j  :  x  G 


Let  =  (J  i  =  1,  • . . ,  k.  Each  11*  is  referred  to  as  a  color  c/.  So  condition 
;=i 

(1.)  means  that  any  two  of  the  subregions  do  not  overlap  if  they  have  the  same 
color.  For  any  t?G  tfj(ft)  and  a  subspace  $(n<'>,v)  is  defined  as  follows: 


$(n(,),u)  =  {u|uG/fJl(n)n(u  =  v,  if  iGn-fi1'*)}. 
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For  partition  II/,  the  functional  7(ft,u)  can  be  calculated  as  follows: 


7(fl,u)  —  52 /(ft*1*,  u)  +  I(£2  —  u)  =/(c{,  u). 


The  multi-color  is  then  as  follows: 


Algorithm.  1  Choose  a  initial  guess  IVe  construct  a  sequence 

{u^,c,}  such  that 

u(,)  =  {u  I  inf  I(c/,U?)} 

and  V  7  >  0  and  J  €  [1, 2,  •  •  • ,  Ar],  3  i  such  that  i  >  I  and  c,-  =  J. 


Since  i  ^  j  implies  D  f =  0, 


inf  J(cif«7)  =  y'  inf  /(ftl^.uj). 


comes 


Each  /(fij^.u?)  can  be  computed  independently.  That  is  where  parallelism 
from. 

A  rigorous  complete  proof  of  convergence  is  wordy.  Here  we  present  an  concise 
version  of  the  proof. 

From  the  construction  of  the  algorithm,  V  i  >  0 


If  u  denotes  the  solution  of  (2.13),  then 

j|  u(‘>  -  u  || l  <  -an(u(,)  -  u,  „«  -  u) 


7 

2, 


=  -[/(«,  «(.•))  -  /(a «)] 

7 

<  ^[/(n,«(o))-/(n,«)]. 


So  there  is  at  least  one  subsequence  such  that 

lim  u*'**  =  u. 


(2.14) 


L  "  -  ■f- 


2.2.  Multi-Color  SAM  and  Its  Convergence 


19 


From  the  construction  of  the  algorithm  and  (2.14),  tt  €  .ffj(Q)  achieves  the  minimum 
in  every  subregion  Now,  let’s  prove  that  u  actually  is  the  solution  of  (2.13). 

Let  be  the  boundary  of  the  subregion  flP  ,  which  is  the  ith  subregion  in 
color  j.  Tp  is  consists  of  two  parts.  First  part  is  r,-^(l)  =  U  To-  It  can  be 
empty  if  there  is  no  common  part  in  the  botmdaries  of  rj^  and  Tn-  The  second 
part  is  the  so-called  artificial  boundary  (2).  It  is  a  -union  of  the  pieces  which 
are  located  in  other  subregions  (of  different  colors). 

From  the  definition  of  u,  we  know  that  it  is  the  solution  of  the  following  problem. 


£(<4m|)  =/, 


*  €  r£>(i), 
x  €  r&>(2), 


x  € 

(2.15) 


n  —  1, 2,  •  •  *  ,  imy 
m  =  1, 2,  •  •  • ,  k 

where  is  th  nth  subregion  in  color  Cm  and  are  the  solution  for  partition 

nm_!.  The  coupled  problems  (2.2)  and  (2.3)  axe  the  simplest  case  of  this  problem. 

First  we  know  that  the  solution  u  of  (2.13)  is  a  solution  of  (2.15).  It  is  also  known 
that  if  the  solution  of  (2.15)  exists,  it  is  unique.  Therefore,  we  may  summarize  the 
above  discussion  as  the  following  theorem: 

Theorem  2.1  For  any  initial  guess  6  ,ffj(f2)  sequence  constructed  in  the 
multicolor  algorithm  converges  to  the  solution  of  (2.13). 

The  extension  of  this  algorithm  to  a  matrix  problem  is  straightforward.  We  may 
replace  the  sequence  of  subregions  by  a  sequence  of  diagonal  block  matrices  A*'] 
of  the  original  matrix  A,  which  satisfy  the  following  conditions: 


1.  Any  two  of  the  diagonal  blocks  do  not  overlap  if  they  have  the  same  color. 

2.  Each  row  of  the  matrix  A  is  covered  by  at  least  by  one  of  the  blocks  Aj'J. 

Then  the  rest  part  of  the  algorithm  is  the  same  as  the  continuous  case.  The 
convergence  of  the  discrete  version  of  multi-color  fyM  is  also  analogous  with  the 
continuous  case. 
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2.3  Sufficient  Conditions  for  Convergence 

There  are  several  proofs  of  convergence  of  [KK58],  [CH62],  [KCSQ85].  The  most 
general  case  was  given  by  two  Russian  mathematicians  Kantorovich  and  Krylov 
in  the  50’s  [KK58].  They  showed  that  five  conditions  together  are  sufficient  for 
convergence  of  to  a  solution  of  a  boundary- value  problem  4 

L{u)  =/, 

«  lrn  =  0, 

These  five  conditions  are  as  follows: 

Uniqueness.  Two  solutions  u  and  u'  which  satisfy  equation  (2.16)  in  ft  are  bounded, 
have  identical  values  on  the  boundary  Tn  (except,  perhaps,  at  a  finite  set  of 
points)  and  are  identically  equal  in  ft. 

Monotonicity  5.  Two  bounded  functions  «  and  u'  which  satisfy  equation  (2.16) 
in  ft  and  have  u  >  u'  on  To  (except,  perhaps,  at  a  finite  number  of  points) 
will  satisfy  u  >  u1  everywhere  in  ft. 

Limit  solution.  The  limit  of  any  monotone  and  uniformly  bounded  sequence  of 
solutions  to  equation  (2.16)  is  also  a  solution  of  (2.16). 

Maximum  principle.  A  solution  to  (2.16)  cannot  have  either  a  positive  interior 
maximum  or  a  negative  interior  minimum.  For  linear  problems  this  implies 
the  monotonicity  condition. 

Continuity  onto  the  boundary.  If  u  =  /  on  a  boundary  segment  except  perhaps 
at  a  point  P  inside  the  segment,  where  /  is  continuous  on  this  segment,  then 
the  solution  u(Q)  for  Q  in  ft  approaches  f{P)  a s  Q  —*  P. 

The  numerical  analog  of  &A/  is  straightforward.  We  can  discretize  the  problems 
1.4-1.6,  and  then  solve  them  numerically.  Miller[Mil65]  showed  that  the  following 

4  As  mentioned  in  their  book,  this  same  proof  can  be  applied  to  a  more  general  functional  equation. 
sFor  linear  problems,  this  condition  can  be  derived  from  the  maximum  principle. 


X6fl, 

x  6  To- 


(2.16) 


v.Vi-f 


1  ^  “  m: 
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conditions  axe  sufficient  for  convergence  of  the  solution  of  the  numerical  to  the 

original  continuous  problem: 

Existence  of  a  continuous  solution.  The  solution  of  the  continuous  problem 
2.16  exists.  This  implies  that  the  solutions  of  the  problems  1.2  and  1.3  exist. 

Existence  of  the  discrete  solutions.  Solutions  of  the  discretized  problems  1.2 
and  1.3  exist. 

Convergent  discretization.  Discrete  approximations  of  1.2  and  1.3  are  conver¬ 
gent  to  the  continuous  solution  of  1.2  and  1.3. 

Contraction  mapping.  There  exist  numbers  Qi  <  1,  Qj  <  1  ,  such  that  Q1Q2  < 
1  and 


||«i  ~  ^1 1|  <  Qi <1, 

||u2  -  Sal)  <  02*2, 

where  ei,e2  are  perturbations  of  the  boundary  data  on  ,  Tj ;  Sj,  u2  are  the 
perturbated  solutions  which  correspond  to  uj,  and  u2. 

For  elliptic  partial  differential  equations  we  can  also  express  problem  2.16  in  an 
equivalent  variational  form;  then  it  is  possible  to  prove  that  the  solution  sequence 
of  the  corresponding  finite  element  method  is  a  convergent  minimization  sequence. 
The  independence  between  convergence  and  the  the  ordering  of  the  solutions  of 
these  subregions  can  be  easily  shown  in  variational  form  [KCSQ85]. 

We  can  also  recast  the  numerical  analog  of  fyM  as  a  modified  matrix  problem, 
then  prove  its  convergence.  From  analysis  of  the  linear  algebra  analog  of  SflM  for  the 
model  problem  we  can  obtain  many  new  results  by  applying  classical  acceleration 
approaches  in  numerical  linear  algebra  to  this  method.  In  Chapters  5  and  6  we  will 
discuss  these  problems  in  detail. 


Chapter  3 


Schwarz  Splitting 

In  this  chapter  a  more  general  model  of  for  application  to  problems  in  linear 
algebra,  Schwarz  Splitting  (or  3>),  is  presented.  For  a  matrix  equation  Ax  =  /,  we 
first  introduce  a  Schwarz  Enhanced  Equation  (or  Ax  =  /.  The  corresponding 
matrix  A  is  called  a  Schwarz  Enhanced  Matrix  (or  Sfjflf).  A  necessary  and  sufficient 
condition  for  the  equivalence  of  the  original  equation  and  is  shown.  In  Section 
3.3  a  few  splitting  matrices  of  are  presented.  In  particular,  the  Schwarz  Splitting 
(or  $5)  is  defined.  Then  some  relations  between  the  eigenvalues  of  these  splitting 
matrices  and  the  corresponding  splitting  matrix  for  the  original  matrix  are  shown. 
If  the  original  matrix  is  an  Af-matrix,  then  %  is  a  convergent  splitting.  The  original 
is  equivalent  to  applying  a  block  Gauss-Seidel  scheme  to  the  It  is  clear  that 
other  classical  acceleration  schemes  can  also  be  applied  to  this  model. 

3.1  Definitions 

As  we  mentioned  in  our  last  chapter,  the  approach  of  to  a  problem  is  to  modify 
it  to  produce  an  equivalent  enhanced  problem,  then  to  solve  the  new  one  iteratively. 
It  is  not  necessary  to  view  only  as  a  way  of  solving  elliptic  partial  differential 
equations.  As  we  mentioned  in  the  introduction,  can  be  viewed  as  a  general 
methodology  for  problem  solving.  A  similar  idea  has  been  applied  to  a  nonlinear 
problem  arising  in  circuit  simulations  [Deu85].  It  was  suggested  that  the  application 
of  to  the  system  of  ODE  is  promising.  Here  is  discussed  in  terms  of  matrix 
theory.  In  Rodrigue  and  Simon’s  paper  “A  generalization  of  the  numerical  Schwarz 
algorithm”,  fyM  is  first  recast  into  numerical  linear  algebra.  Then  many  results 
of  the  classical  analyses  in  linear  algebra  [Var62]  could  be  applied.  This  approach 


3.1.  Definitions 


provided  possibilities  of  generalizing  and  improving  We  extend  this  thought 
further  to  a  general  linear  system  of  equations  in  this  chapter. 

Consider  a  matrix  problem: 

Ax  =  /,  (3.1) 

where  A  is  an  N  x  N  nonsingular  matrix,  /  and  z  are  N  vectors.  A  partitioned 
form  of  the  equation  (3.1)  will  be  used  in  the  rest  of  this  thesis.  A  partitioning 
is  defined  by  the  integers  nj,  n2,  •  •  • ,  n^k+i  where  n2j  >  0,  n2t+i  >  01  for  all  i,  and 
where 

4-  n2  +  —  +  n2*+i  =  N.  (3.2) 

Given  a  set  {n^}*** 1  which  satisfies  (3.2),  the  (2 k  +  1)  x  (2k  +  1)  partitioned  form 
of  the  matrix  A  is  then  given  by 


Aj.i  Ai)2 

A2i  i  a2>2 


Ai,2*+i 

A2,2*+i 


Aak+i.j  A2k+i,2 


A2*+i,2*+i 


where  Aij  is  an  n,  x  n3  submatrix.  We  always  assume  that  the  unknown  vector  x 
and  the  known  vector  /  in  the  matrix  equation  Ax  =  /  are  partitioned  in  a  form 
consistent  with  A.  Thus,  if  A  is  given  by  (3.3),  then  x  is  assumed  to  be  partitioned 


x  —  [xi,x2,--, x2*+i]  ,  (3.4 ) 

where  x,  is  an  n,  x  1  matrix  (column  vector).  A  dual  vector  of  x 

X  =  [x i ,  X2 ,  X2,  X3,  X4 ,  X4,  X5,  ,  X2lci  %2ky  % JIt+l]  (3.5) 

is  defined  such  that:  all  even  subvectors  x2>,  i  =  l,***,fc  are  duplicated  once  in 
their  places,  and  all  odd  subvectors  remain  the  same. 

A  partitioned  matrix  can  also  be  represented  by  a  directed  graph.  Consider  any 
2Jt  4-  1  distinct  points  Pi,  P2,  • ,  Pik+i  in  the  plane,  which  we  shall  call  nodes.  For 

every  nonzero  entry  Ait]  of  the  matrix  A,  we  connect  the  node  P,  to  the  node  P}  by 


'We  will  explain  the  reason  for  this  partitioning  pattern  later. 


a 


V 

V 
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A22 


A33 


Figure  3.1:  Directed  graph  G(A). 


means  of  a  path  2  P,P},  directed  from  Pi  to  Pj,  as  shown  in  Figure  3.1.  By  relating 
each  path  PtP}  to  the  corresponding  A,j,  the  matrix  A  is  associated  with  a  finite 
directed  graph  G(A).  As  an  example,  a  dense  3x3  partitioned  matrix 


A  = 


[  Ai.i  At, 3  Aia  ] 

A34  Aij  A2j 

A3,i  A3, 2  A.3,3 


has  the  directed  graph  C»(A)  in  Fig.  (3.1). 

If  the  operator  L(u)  in  equation  (1.1)  is  a  linear  second  order  elliptic  operator, 
then  the  discretized  problem  can  be  written  as  a  matrix  equation  : 


An 

A12 

A13 

Ax  = 

Aji 

A22 

A23 

A31 

A32 

A33 

Xl 

x 2 
X3 


h 

fl 

h 


=  /• 


(3.6) 


The  order  of  the  unknowns  is  arranged  so  that  [xi,  xj]  corresponds  to  the  unknowns 
in  fli,  [ij,  X3]  corresponds  to  the  unknowns  in  flj  and  [xj]  corresponds  to  the  un- 
knowns  in  flu,  which  is  the  overlapped  part  of  the  two.  The  numerical  for  the 


JFor  a  diagonal  entry  A, ,  ^  0,  the  path  joining  the  node  Pi  to  itself  is  called  a  loop.  For  an 


illustration,  see  Figure  (31) 


E&a 


above  problem  solves  the  following  subproblems  alternatively: 


- » 

►-* 

*■* 

so 

_ 1 

\  X?  1 

‘  ft  ' 

A13 

.  . 

rs: 

+ 

An  A12 

.  ft  . 

A2  3 

A22 

A23 

'  fi  ' 

+ 

A13 

A35 

A  33 

x(0 
13  J 

.  . 

A23 

It  is  not  difficult  to  observe  that  this  procedure  is  equivalent  to  a  2  x  2  block  Gauss- 
Seidel  iteration  for  the  following  matrix  equation: 


Ax  = 


All  A12  0  A13 

XX 

'  h  ' 

Aji  A22  0  A  23 

X2 

fi 

A2i  0  A22  A23 

x'2 

ft 

A31  0  A32  A  33 

.  *3  . 

.  / 3  . 

(3.8) 


From  the  convergence  proof  discussed  in  the  last  chapter  we  know  that  the  procedure 

(3.7)  will  converge,  the  solution  of  equation  (3.8)  satisfies  x2  =  x2,  and  [5j,  x2,  x3]T 
is  a  solution  of  equation  (3.6).  This  is  to  say  that  the  dual  vector  of  the  solution 
of  (3.6)  is  the  solution  of  (3.8)  and  vice  versa3.  We  shall  call  the  equation  (3.8)  the 
Schwarz  Enhanced  Equation  (or  S[iP)  of  (3.6)  and  the  corresponding  matrix  A  in 

(3.8)  the  Schwarz  Enhanced  Matrix  (or  of  the  matrix  A.  The  formation  of 
SeM  can  also  be  illustrated  in  terms  of  a  directed  graph.  As  we  mentioned  before, 
the  original  matrix  is  represented  by  the  directed  graph  in  Fig.(3.1).  Let  us  split 
node  2  into  a  pair  of  dual  nodes  (P2,  P2),  and  let  the  incoming  path  from  Pi  point 
to  P2,  and  the  incoming  path  from  P3  point  to  P2. 

The  loop  path  of  the  original  node  is  duplicated  for  both  dual  nodes  (see  Fig. 
3.2  ).  This  new  directed  graph  is  called  the  dual  graph  for  the  A. 

This  idea  of  forming  a  new  equivalent  problem  can  be  generalized  in  two  wavs: 
we  may  enhance  the  new  enhanced  equation  recursively;  or  we  may  partition  the 
matrix  A  into  a  matrix  like  (3.3)  and  then  enhance  this  partitioned  matrix.  Here 

3Later  we  will  prove  that  this  conclusion  can  be  true  only  when  A22  exists.  For  most  approxi¬ 
mations  of  an  elliptic  partial  differential  equation  this  restriction  is  satisfied. 
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Aw 


Ai  2 


A33 


Figure  3.2:  Dual  graph  G(A). 

we  only  discuss  the  latter  approach.  The  results  of  the  following  discussion  can  be 
easily  applied  to  the  former. 

Let  the  matrix  A  in  (3.1)  be  partitioned  in  the  form  (3.3),  where  n3,  >  0  and 
n2l+ 1  >  0  for  all  i.  The  reason  for  assuming  n3t  >  0  is  that  we  are  going  to  split 
the  node  P2,  and  splitting  a  null  node  *  is  meaningless.  On  the  other  hand,  a  null 
nonsplitting  node  can  be  used  to  cover  the  case  of  adjacent  nodes.  The  case  of 
splitting  a  node  into  three  or  more  nodes  can  be  covered  by  recursive  splitting,  but 
there  is  little  practical  reason  to  do  that.  To  form  the  Schwarz  Enhanced  Matrix 
A ,  we  first  split  every  even  node  P2l  into  a  pair  of  dual  nodes  (P2l,  P2|),  i  =  1,  •  •  • ,  fc, 
and  copy  every  odd  node  Pn+i  to  a  new  node  Pn+\ .  The  even  nodes  P2,  are  also 
called  overlap  nodes.  The  new  nodes  P,  and  P[  are  the  nodes  of  the  directed  graph 
G(A)  for  the  A.  Then  for  each  path  P/Pm  in  the  original  G(A),  we  will  put  a 
corresponding  path  (or  paths)  into  the  dual  graph  G(A).  The  rules  jure  listed  in  the 
following  table.  The  far  right  column  lists  six  logical  expressions,  of  which  only  one 
can  be  true  for  each  path  in  G(A).  After  identifying  the  case  for  which  the  logical 
expression  in  the  fourth  column  is  true,  we  will  add  the  path  (or  paths)  in  the  third 
column  to  G(A).  The  corresponding  entry  for  each  path  (or  paths)  is  given  in  the 
second  column.  Let  Sa  and  Se  denote  the  sets  of  odd  numbers  and  even  numbers, 

*Here  we  define  a  node  as  null  node  if  there  is  no  path  to  or  from  this  node.  Equivalently,  we 
may  say  that  a  node  P,  is  null  if  m  =  0. 
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respectively.  We  have  the  following  table: 


Case 

Entry 

Path 

Condition 

Case  1 

An 

Pi  Pi 

(/  =  m)n(/€  So) 

Case  2 

An 

P/P/  PIP! 

(l  =  m)  D  (/  €  5.) 

Case  3 

A/m 

P/Pm 

(/  ^  m)  PI  (/  €  S0)  fl  ((m  €  S0 )  U  (/  >  m)) 

Case  4 

Aim 

PiK 

(/  ^  m)  n  (/  6  S0)  H  ((m  6  Se)  D  (/  <  m)) 

Case  5 

Aim 

(/  #  m)  n  (/  €  Se)  n  ((m  €  S„)  U  (/  >  m)) 

Case  6 

Aim 

(/  ^  m)  n  (/  6  se)  n  ((m  e  Se)  n  (/  <  m)) 

We  may  also  interpret  this  table  graphically.  As  in  Fig.  (3.1)  and  (3.2),  we  may 
lay  the  2k  -f  1  nodes  in  a  straight  line,  with  the  nodes  in  numerical  order.  Then  we 
split  each  even  node  into  P'2i  and  P2i,  one  by  one,  starting  from  P2.  AH  incoming 
paths  from  the  left  side  of  P2,  wiU  point  to  P^;  and  all  incoming  paths  from  the 
right  wiU  point  to  P2t.  The  outgoing  path  from  P2,  is  then  split  into  two  outgoing 
paths  for  both  new  nodes  and  wiH  point  to  the  same  destination  as  before.  All  loops 
of  the  overlapped  nodes  wiU  be  duplicated  for  both  dual  nodes.  After  we  split  all 
even  nodes,  the  new  graph  is  the  dual  graph  G(A).  Notice  that  the  dual  nodes  of 
each  pair  have  exactly  the  same  outgoing  paths,  with  the  exception  of  the  two  loop 
paths.  The  matrix  A  constructed  according  to  the  above  rules  is  caUed  the  Schwarz 
Enhanced  Matrix  (%W)  with  respect  to  the  partition  (3.3),  and  the  corresponding 
matrix  equation 

Ax  —  J  (3.9) 

is  caUed  the  Schwarz  Enhanced  Equation  (•%£),  where  /  is  the  dual  vector  of  /. 
Here  is  an  example  of  a  5  x  5  block  matrix  equation  and  its 


Ax  — 


An 

An 

A13 

Ah 

A2i 

a22 

A23 

A24 

A31 

A32 

A33 

A34 

A41 

a42 

A43 

A44 

*51 


M2 


A53 


A15 

An 

A35 


MS 


X, 

‘  /l  ' 

Xj 

h 

*3 

= 

/a 

J4 

h 

^5 

.  fs 

=  /• 
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Ax  = 


An 

Ai2 

0 

Al3 

Aj4 

0 

Ais 

A2i 

a22 

0 

A23 

A24 

0 

A2s 

A2i 

0 

a22 

A23 

A24 

0 

A25 

A3i 

0 

A32 

A33 

A  34 

0 

A35 

A*i 

0 

A<2 

A43 

A44 

0 

A45 

A*i 

0 

A42 

A43 

0 

A44 

A45 

Asi 

0 

Aj2 

A  53 

0 

A54 

A55 

II 

’  h ' 

X2 

h 

A 

fl 

X3 

= 

h 

X* 

u 

A 

U 

Xs 

- \ 

< 

=  /• 


If  we  merge  each  pair  of  dual  nodes  into  a  single  node,  and  fold  each  pair  of  paths 
from  the  same  dual  pair  into  a  single  path,  the  resulting  graph  is  identical  to  the 
original  one.  From  the  construction  of  it  is  easy  to  see  the  following  result: 


Lemma  3.1  If  vector  x  =  (xj,  x2,  •  •  • ,  X2*+i)t  is  the  solution  of  equation  (3.1),  then 
its  dual  vector  x  is  the  solution  of  $£&  Ax  =  f,  where  f  is  the  dual  vector  of  f. 


The  matrices  A2li2t,  i  =  1,  •  *  • ,  k  axe  also  called  overlapped  blocks.  If  two  Schwarz 
enhanced  matrices  B  and  C  of  the  same  matrix  A,  for  which  the  overlapped  blocks 
axe  Bfrji  and  C2l|2i,  i  =  1,  •  •  • ,  fc,  respectively,  have  such  a  relation  that  each  .B2,i2, 
is  a  submatrix  of  the  corresponding  C2,t2i,  then  we  say  C  has  more  overlap  than 
B.  This  overlap  is  closely  related  to  the  overlap  area  of  the  solution  regions  for  the 
subregions  mentioned  in  the  introduction.  As  we  have  shown,  for  the  continuous 
model  problem,  if  the  amount  of  overlap  increases,  then  the  convergence  rate  will 
increase  too.  For  the  matrix  model  we  have  a  similar  result. 


3.2  Equivalence  Theorem 

A  necessary  and  sufficient  condition  for  the  equivalency  of  equation  (3.1)  and  its 
SeP  (3.9)  is  given  in  this  section. 

Theorem  3.1  Let  A(A),  ^(A)  and  A(A„),i  =  l,---  ,2fc-fl  be  the  sets  of  eigenvalues 

k 

of  A,  A  and  A,,,  i  =  1,  •  •  • ,  2k  -f- 1,  respectively.  Then  X(  A)  C  A(A)  U  ( (J  A(A2,.2|)). 


IvVw 
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Proof.  Let  A  be  an  eigenvalue  of  A  and 

X  =  ^2fc+l) 

be  the  corresponding  eigenvector.  Substituting  x  into  the  equations  which  corre¬ 
spond  to  the  dual  nodes  P2i  and  Pa  >  we  have 

=  Ax*,  (3.10) 

HAjijXj  =  Ax2l.  (3.11) 

As  we  mentioned  in  the  last  section,  only  one  term  is  different  in  the  left  hand  sides 
of  the  two  equations.  Subtracting  (3.10)  from  (3.11),  we  have: 

Mi, 2,(52,  ~  *2.)  =  A(x*  -  x2i),  i  =  1,  •  •  • ,  k. 

*2.  ~  x2i  ^  0  for  some  i,  then  we  have  A  e  U  HMi.i,)-  If  A  £  |J  X(A2k2i), 
then  i2»  has  to  be  equal  to  x'2i  for  :  =  1, •••,*.  ’Therefore,  x  is  a  du’cd  vector  of 
x  =  (xi,  x2,  x3,  •  •  • ,  xu+l)T,  which  will  satisfy  equation 

Ax  =  Ax. 

Thus  A  6  A(A),  which  concludes  the  proof. 

Define  S$  (3.9)  as  equivalent  to  (3.1)  if  A-1  exists  and  the  solution  vector  x  is 
a  dual  vector  of  the  solution  x  of  (3.1).  Similarly,  we  say  that  SfeM  A  is  equivalent 
to  matrix  A  if  A~l  exists.  With  this  definition  and  the  result  from  Theorem  3.1  we 
have 

Theorem  3.2  If  a  matrix  A  is  a  Schwarz  enhanced  matrix  of  the  nonsingular  ma¬ 
trix  A,  then  the  following  are  equivalent: 

1.  Matrix  A  w  equivalent  to  matrix  A. 

2-  0  *  U  A(A2,.2.). 

i=i 

k 

Proof.  If  0  £  U  A(A2,,2,),  then  from  Theorem  3.1  we  know  A-1  exists.  Applying 

1=1 

the  strategy  used  in  the  proof  of  Theorem  3.1,  we  can  show  that  the  solution  x  of 
Ax  =  /  is  a  dual  vector  of  the  solution  x  of  Ax  =  /. 


I»5S 

l.'N 
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Now  we  show  that  0  ^  U  ^(^2«,2t)  is  also  a  necessary  condition.  Suppose  there 

1=1 

is  a  j  such  that  0  6  X(A2j,2j)-  We  know  that  A^jai  is  singular.  Let  the  rows  of 
matrix  A^h2j  be  r,,  i  =  1,  •  •  • ,  nr  There  is  a  constant  vector  a  =  (ax,  •  •  •  ,  anj  )T  /  0 
such  that: 

n> 

Y  a,r,  =  0. 

i=i 

Let  the  rows  of  the  Spfil  A  for  the  dual  nodes  Pjj,  P '2j  be  bj  and  c, ,  i  =  1,  •  •  • ,  n3 
respectively,  where 

b«  =  (^1 ,  ■  ■  ■  i  tjj—i  i  r,»  0)  *2,7+1 » ■  *  ■  i  ®2*+i  )i 

Ci  —  (^i, ‘••,62^  — 1,0,  I"ii  ®2;+l,  "  '  i  ®2fc+l  )• 

From  the  definition  of  a  £>EM,  the  only  differences  between  rows  b,-  and  c,  are  in  the 
positions  where  the  r,  are  located.  It  is  easy  to  verify  that: 


]Ta(bi  -  £  a/C/  =  0. 

i=i  i=i 

It  means  that  A  is  singular.  The  proof  is  complete. 

If  a  matrix  is  a  positive  definite  matrix  or  an  M-matrix5,  any  principal  minor  of 
this  matrix  is  also  a  positive  definite  matrix  or  M-matrix.  Thus,  we  immediately 


Corollary  1  Any  SEMof  a  positive  definite  matrix  A  is  equivalent  to  A. 


Corollary  2  i4ny  S]<Mof  an  M-matrix  A  is  equivalent  to  A. 


3.3  Splittings  of  Schwarz  Enhanced  Matrices 

From  the  results  of  the  last  section  we  know  that  the  solution  of  Ax  =  /  is  equivalent 
to  the  solution  of  Ax  =  ~f .  Here  we  will  analyze  the  application  of  some  classical 

5  Any  n  x  n  matrix  A  —  (a,;  )  with  <  0  for  all  «  /  j  is  an  M-matrix  if  A  is  nonsingular,  and 
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splitting  techniques  to  QM-  If  we  want  to  solve  the  matrix  equation  Ax  =  /  where 
.4  is  an  n  x  n  nonsingular  matrix,  we  consider  expressing  the  matrix  A  in  the  form 

A  =  M  -  N,  (3.12) 


where  M  and  N  are  also  n  x  n  matrices.  If  M  is  not  singular,  we  say  that  this 
expression  represents  a  splitting  of  the  matrix  A,  and  associated  with  this  splitting 
is  an  iterative  method 

Mx{k+l)  =  Nx(k)  +  /.  (3.13) 

Most  important  iterative  methods  can  be  described  from  this  point  of  view.  It  is 
also  called  a  linear  stationary  method  of  first  degree.  The  matrix  M~lN  is  called 
the  iterative  matrix  of  this  splitting.  The  convergence  behavior  of  this  splitting 
is  decided  by  more  specifically,  by  the  spectral  radius6  of  the  iterative 

matrix  M~*N  and  the  distribution  of  the  eigenvalues  of  this  matrix.  Particularly, 
we  call  a  splittings  a  convergent  splitting  if  p(M~1N)  <  1  holds.  If  the  diagonal 
entries  of  the  matrix  A  =  (a,;)  are  all  nonzero,  and  we  express  the  matrix  A  as  the 
matrix  sum 

A  —  D  —  L  —  U, 

where  D  —  diag(a\ti,a2,2,  •  •  • ,«(,,»)  and  L  and  U  are  ,  respectively,  strictly  lower 
and  upper  triangular  n  x  n  matrices,  then  the  following  choices 

Mpj  —  D ;  Npj  =  L  +  U, 

Mpa  =  D  —  L;  Npa  —  U 

give  the  point  Jacobi  and  point  Gauss-Seidel  splitting,  respectively. 

Let  A  =  Mpj  —  Npj  be  the  point  Jacobi  splitting  of  and  A,,,  =  D,  —  L,  — 
Ui ,  i  =  1,  •  •  • ,  2Jt  +  1  where  A,,,  is  the  diagonal  block  in  (3.3).  The  eigenvalues  of  the 
iterative  matrix  MpjNpj  and  the  eigenvalues  of  the  point  Jacobi  iterative  matrix 
of  A  have  the  following  relation: 

4The  spectral  radius  p(A)  of  a  matrix  A  ia  defined  aa 


p(A)  -  max  |  Aj 
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Theorem  3.3 

A (Mp)Npj)  C  \{Mp)Npj)  U  [U  A (D;f(L2i  +  U2,))}. 

i=i 

Proof.  Notice  that  the  iterative  matrix  MpjNpj  is  a  of  the  matrix  Mp)Npj 
with  respect  to  the  same  partition  of  A  in  (3.3).  Apply  theorem  3.1  to  this  point 
Jacobi  iterative  matrix,  the  proof  is  complete. 

This  theorem  shows  that  the  point  Jacobi  splitting  of  $eM  does  not  change  the 
performance  of  the  point  Jacobi  splitting  of  the  original  matrix. 

For  the  block  Jacobi  iterative  method  in  which  the  matrix  A  is  partitioned  into 
the  form  of  (3.3),  a  conclusion  similar  to  theorem  3.3  can  be  derived.  Let  A  be  the 
SfiM  of  the  matrix  A,  the  block  Jacobi  splitting  for  this  partitioning  of  A  be 

A  =  Mbj  -  Nbj  , 

and  the  corresponding  block  Jacobi  splitting  of  the  SeM  be 

A  =  Mbj  -  Xbj, 

where  Mbj  =  diag(Ait\,  •  •  • ,  Aj*+i,2*+i)  and  Mbj  is  of  the  matrix  Mbj •  Then 
for  this  special  block  Jacobi  splitting  we  have 
Theorem  3.4 

A  (MJjNbj)  =  KMbjNbj). 

Proof.  As  in  the  case  of  the  point  Jacobi  iterative  matrix,  this  block  Jacobi  iterative 
matrix  of  A  is  the  of  the  block  iterative  matrix  of  A.  Since  the  diagonal 
blocks  of  both  iterative  matrices  are  zero,  the  second  term  on  the  right  hand  side  in 
Theorem  3.1  vanishes;  and  thus  the  equality  holds.  If  we  split  the  diagonal  blocks 
as  A,,,  =  Mi  -  N,  and  let  M'gj  =  diag(M i ,  •  •  • ,  M2k+i ),  then  we  have  a  more  general 
result  them  Theorem  3.4: 
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Corollary  3 

HMH/'n'bj)  C  \(M'Bj-'N'Bj)  U  OJ  A  (Mi,-' Nii)). 

i=l 

Theorem  3.4  tells  us  that  this  particular  block  Jacobi  splitting  of  a  S$d  does  not 
improve  the  convergence  rate  when  we  compare  it  with  the  corresponding  splitting 
of  the  original  matrix  A. 

For  the  point  Gauss-Seidel  splitting  of  the  dual  matrices  we  also  have  a  similar 
theorem.  Let  A  =  Mpg  —  Npa  be  the  point  Gauss-Seidel  splitting  of  a  It  is 
not  difficult  to  see  that  Mpg  and  Npg  are  the  of  the  matrices  Mpg  and  Npc , 
respectively.  The  following  result  cam  easily  be  obtained  from  the  proof  of  theorem 
3.1. 

Theorem  3.5 

A (MphNpG)  C  \(Mp1gNpg)  U  (U  A ((£>*  -  L*)"1^))- 

.=i 

This  result  shows  that  if  we  relax  each  subproblem  only  once,  the  convergence 
factor  is  independent  of  the  overlap7.  It  is  interesting  that  this  non-positive  result 
has  a  very  useful  application.  When  we  use  the  multigrid  method  in  a  composite  grid 
environment,  one  important  question  is  how  the  overlap  will  effect  the  convergence. 
There  are  some  experiments  (see  [ST82])  which  show  that  the  amount  of  overlap 
does  not  affect  the  convergence,  and  thus  we  can  reduce  the  overlap  to  a  minimum8 
in  order  to  cut  down  the  cost  of  each  sweep.  This  theorem  gives  us  an  explanation. 
Another  extreme  is  fyM,  in  which  case  the  relaxation  is  carried  out  to  convergence  of 
the  subproblem.  Then  the  convergence  factor  is  exponentially  related  to  the  overlap 
(see  introduction).  We  might  expect  that  the  effect  of  the  overlap  on  the  convergence 
will  increase  when  we  increase  the  relaxation  sweeps  in  each  subproblem. 

So  far  the  splitting  techniques  we  have  discussed  axe  not  very  promising.  But 
this  picture  can  be  changed.  Consider  a  new  partitioning  of  let 

A  =  Ms  -Ns,  (3-14) 

7If  we  relax  the  subproblems  more  than  once  then  the  conclusion  is  not  valid 

*There  are  other  factors  which  must  also  be  considered.  For  example,  the  amount  of  overlap 
must  be  sufficient  to  ensure  the  accuracy  of  the  interpolation. 
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where  Ms  =  diag(Si,  S2,  •  •  • ,  5*),  and 

s,  -i*'1 


^1,1  ^1.2 

A24  Ajtj 

A2  i,2»  A2i,2t+1 

^2i,2i+2 

A2i+l,2i  A2,+i,2.+1 

A2t+l,2«+2 

A2«+2,2i  A2,+2,2t+l 

A2i+2.2«+2 

^2*. 2*  A2*,2Jfc+l 

1 

i  =  2,-  •  • ,  k  -  1; 


L  A2lc+l,2k  A2k+l,2k+l  \ 

We  define  the  splitting  (3.14)  to  be  a  Schwarz  Splitting  (•?>).  We  should  always 
relate  a  Schwarz  splitting  to  the  corresponding  partition.  A  different  partitioning 
will  lead  to  a  different  ■$>.  From  this  definition  we  know  that  a  is  essentially  a 
block  Jacobi  splitting  for  a  particular  partition  of  $eM,  and  is  the  Gauss- Seidel 
splitting  which  corresponds  to  this  partition. 

Very  often  problems  in  the  biological,  physical  and  social  sciences  can  be  reduced 
to  problems  involving  matrices  which  have  some  special  structure.  One  common 
situation  is  where  the  matrix  is  an  Af-matrix.  As  we  mentioned  in  last  section,  any 
SeM  of  an  M- matrix  is  equivalent  to  the  original  matrix.  Now  we  have  the  following 
result: 

Theorem  3.6  A  Schwarz  splitting  of  any  Schwarz  enhanced  matrix  A  is  a  conver¬ 
gent  splitting  if  A  is  an  M -matrix. 

Proof.  We  define  a  splitting  A  =  M  —  N  as  a  regular  splitting  of  A  if  M  is 
nonsingular  with  M  >  0,  and  N  >  0.  A  well  known  result  for  the  regular  splitting  is 
that  if  A’1  >  0,  any  regular  splitting  of  the  matrix  A  is  a  convergent  splitting[Var62]. 
It  is  clear  that  if  A  is  an  Af- matrix  then  the  *?>  is  a  convergent  splitting.  By  the 
comparison  theorem  for  M-matrices9,  we  can  also  derive  a  comparison  relation 
between  the  splittings  we  discussed  above. 


9Let  A  =  Mj  —  Nx  —  A/j  —  Nj  be  two  regular  splittings  of  A,  when  A~l  >  0.  If  N i  >  iVj  then 
the  spectral  radii  of  the  matrices  M~lNx  and  have  the  following  relation: 

p(MrlNi)  >  p(M{lN7) 


TtW'.JV'WW'V:  swSf 
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Theorem  3.7 

piM^Ns)  <  p(Mkj~lN'Bj)  <  ~p(Ms)Nbj)  <  p(MpljNPJ). 

Proof.  From  the  construction  we  have 

Npj  >  Nbj  >  N'bj  >  Ns. 

Application  of  the  comparison  theorem  concludes  the  proof. 

From  this  result,  we  know  that  $S  is  the  best  splitting  among  these  splittings. 
In  Chapter  5  we  will  derive  some  quantitative  results  for  $  of  the  model  problem 
of  elliptic  PDE’s. 


Chapter  4 


Template  Operators  and  Exponential  Decay 


In  the  last  two  chapters,  we  have  shown  that  or  can  be  applied  to  large 
classes  of  problems,  but  we  have  not  addressed  the  issue  of  how  to  recognize  the 
problems  for  which  is  most  suitable.  Now  we  will  reexamine  the  analysis  of  the 
model  problem  in  the  Chapter  2  from  a  different  point  of  view.  More  specifically,  we 
will  study  a  particular  behavior  of  the  inverse  of  the  same  operator,  the  exponential 
decay  phenomenon.  First,  the  Green’s  functions  for  the  model  problem  in  1-,  2- 
and  3-dimensional  solution  space  are  discussed.  The  relation  between  the  decay  of 
the  Green’s  function  and  the  convergence  speed  of  is  studied.  Then  in  Section  2 
the  decay  of  the  “discrete  Green’s  function”  of  a  matrix  is  studied.  Specifically,  the 
exponential  decay  of  a  banded  matrix  is  examined  in  detail.  We  have  found  that  the 
matrix  is  not  a  good  structure  to  study  this  problem.  In  Section  3,  a  new  structure, 
template  operator ,  for  a  linear  operator  in  a  finite  dimensional  space  is  developed. 
In  the  last  Section,  the  concepts  of  influencing  and  influenced  wavefronts  are  intro¬ 
duced.  Then  some  estimates  of  the  norm  of  the  wavefront  axe  presented.  These 
results  provide  a  theoretical  basis  for  determining  when  these  Schwarz  techniques 
can  be  used  successfully. 

4.1  A  Key  to  the  Success  of  SAM 

In  Chapter  2,  the  analysis  of  for  the  model  problem  shows  that  if  the  over¬ 
lap  increases  the  convergence  factor  of  *§1^/  improves  exponentially.  Moreover,  the 
higher  the  spatial  dimension,  the  bigger  the  improvement.  If  we  combine  these 
analyses  with  some  other  techniques,  or  •?>  can  be  developed  to  be  as  compet¬ 
itive  as  other  powerful  methods.  One  of  the  key  facts  which  makes  become 
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an  optimal  iterative  method  is  the  exponential  relation  between  the  overlap  and 
'■he  convergence  factor.  More  specifically,  when  the  overlapping  area  increases  the 
convergence  factor  decreases  exponentially.  For  a  Poisson  equation,  we  may  also 
use  the  decay  of  the  Green’s  functions  to  explain  this  result  more  intuitively.  As 
we  know,  the  solution  u(P)  for  the  model  problem  with  homogeneous  boundary 
condition  can  be  expressed  by  the  corresponding  Green’s  function  as 

"(P)  =  JaG(p,Q)f(Q)JQ. 

The  influence  of  the  forcing  function  f(Q)  on  the  solution  u(P)  is  decided  by  the 
value  of  the  Green’s  function  at  (P,  Q).  The  Green’s  functions  for  model  problems 
in  one-,  two-  and  three-dimensional  solution  space  are  as  follows  l: 


1.  One  dimensional  problem  (0  <  x  <  1): 

2.  Two  dimensional  problem: 


-Ox  for  x<0 
*)£  for  x  >  0 


G(x,y,00  =  r-ln 


2t  \A*  -  O’  +  (y  -  nT1 


3.  Three  dimensional  problem: 

G(x,y,z,Cr},0  =  - 7  1 


Let  P  represent  x,(x,y),  or  (x,y,x)  and  Q  represent  0(00,  or  (0^,0-  We 
may  observe  that  when  the  distance  between  P  and  Q  increases,  the  influence  of 
Q  on  the  solution  at  P  decreases.  If  the  overlap  is  increased,  the  artificial  bound¬ 
aries  are  moved  away  from  the  boundaries  of  the  subregions.  Consequently,  the 
contributions  from  the  error  on  the  boundaries  of  these  subregions  diminish  expo¬ 
nentially  with  the  increasing  distance.  This  observation  is  not  only  true  for  the 

1  Here  we  list  the  Green’s  functions  for  2-  and  3-dimensional  Poisson  equations  in  infinite  domains 
which  are  easier  to  explain.  The  Green's  functions  for  finite  domains  have  a  similar  decay  but  are 
more  complicated. 
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Poisson  equation.  Actually,  it  appears  in  many  physical  processes.  The  decay  of 
the  Green’s  function  is  just  a  mathematical  description  of  a  common  physical  phe¬ 
nomenon:  that  the  influence  between  two  points  will  weaken  if  the  distance  between 
them  increases.  This  suggests  that  and  can  be  applied  to  many  important 
applications  successfully.  Late  on,  we  will  explore  this  issue  mathematically. 

Another  observation  which  cam  be  obtained  from  these  Green’s  functions  is  that 
the  higher  the  spatial  dimension  the  faster  the  decrease  of  the  influence! 

The  inverses  of  the  matrices  A  associated  with  Poisson’s  equation  with  Dirichlet 
boundary  value  conditions  give  “discrete  Green’s  functions”.  A-1  should  be  a  good 
approximation  to  the  Green’s  function  (See  Birkhoff’s  book  “ Numerical  Solution 
of  Elliptic  Problems”).  Thus  a  similar  decay  behavior  should  be  true  for  these 
inverses2.  This  observation  motivates  us  to  seek  more  kinds  of  operators  for  which 
the  inverse  has  such  a  decay  property. 

4.2  Exponential  Decay  and  Banded  Matrices 

The  exponential  decay  of  the  off-diagonal  elements  of  the  inverse  of  a  diagonally 
dominant  tridiagonal  matrix  was  observed  decades  ago[Ker70].  There  were  several 
papers  which  discussed  the  topic  of  the  exponential  decay  of  the  inverse  of  a  banded 
matrix[Dem77],  [dB80],  etc..  In  summary,  an  estimate  of  the  form 

KjISC-r1*-'1  (4.1) 

was  given  in  these  papers,  where  a,-j  is  the  element  of  the  inverse  of  a  banded 
matrix.  The  claims  in  these  paper  axe  somewhat  misleading.  The  first  issue  is:  can 
we  guarantee  a  decay  from  (4.1)?  The  answer  is  no!  Without  any  further  conditions 
on  the  linear  operator  the  above  estimate  provides  us  with  no  useful  information. 
There  are  two  pitfalls  in  this  statement.  First  is  the  “constant”  C.  In  the  following 
example  we  will  show  that  C  can  be  so  big  that  an  exponential  increase  may  happen! 
The  second  pitfall  is  the  decay  term  7^-;'.  Even  though  we  do  have  7  <  1  here,  7 
is  a  function  of  the  order  of  the  matrix  n  in  question.  For  example,  7  2  1/(1  4-  n-2) 

2In  next  chapter,  we  will  prove  that  these  conclusions  can  be  derived  for  discrete  model  problems 


1 
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for  a  matrix  derived  from  the  model  problem.  So  we  will  have  7”  =  e“2,  a  nonzero 
limit,  as  n  — ►  00.  These  two  factors  lead  to  a  bound  which  is  so  weak  that  virtually 
anything  can  happen.  Here  are  some  counter  examples.  The  first  is: 


A  = 


1 

-2  1 
-2  1 
-2  1 


0 


0 


It  has  an  inverse: 


1 

2  1 

4  2  1 

8  4  2  1 


-2  1 
-2  1 


nxn 


0 


2n-l 


2" 


2  1 
2  1 


nXn 


The  off-diagonal  elements  of  the  inverse  actually  increase  exponentially.  People  may 
argue  that  this  matrix  is  not  stable  3.  Imposing  a  stability  condition  only  solves 
the  problem  of  the  big  constant  C.  The  second  problem  still  exists.  The  following 
example  is  derived  from  a  boundary  value  problem  for  the  one-dimensional  model 
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problem.  This  matrix  is  symmetric  positive  definite,  but  its  inverse  still  has  a 
growth  away  from  the  main  diagonal  along  some  rows  and  columns. 


'  0.92  -1 
-1  2  -1 
-12-1  0 
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Both  examples  have  shown  that  band  structure  does  not  guarantee  decay  in  the 
inverse.  Here  is  another  example.  The  following  matrix  is  not  banded(!),  but  its 
inverse  shows  an  interesting  decay.  If  we  arrange  the  elements  of  any  row  or  column 
of  this  matrix  on  a  circle  with  equal  spaces  and  think  of  the  original  diagonal 
element  as  a  central  element,  the  elements  on  this  circle  decay  away  from  this 
central  element.  This  matrix  is  derived  from  a  periodic  boundary  value  problem. 
All  these  examples  show  that  band  structure  is  not  a  good  predictor  for  the  decay  of 
its  inverse.  Furthermore,  band  structure  is  also  related  to  the  ordering  of  the  matrix 
in  question.  If  we  reorder  a  banded  matrix  as  a  random  sparse  matrix,  the  decay 
still  exists,  but  it  can  not  be  described  in  terms  of  distance  from  the  main  diagonal. 


4.2.  Exponential  Decay  and  Banded  Matrices 


41 


This  suggests  that  the  decay  is  essentially  caused  by  a  locality  or  compactness  of 
an  operator  which  is  independent  of  the  bandness  of  the  matrix  or  the  ordering  of 
the  variables. 

'  2.1  -1  -1  ' 
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Particularly,  for  operators  in  a  high  dimensional  solution  space  or  for  operators 
which  are  derived  from  the  finite  element  method,  the  concept  of  band  can  no 
longer  characterize  the  locality  of  the  operator. 

The  second  issue  is  how  to  define  a  concept  of  distance  between  nodes  which  is 
meaningfully  related  to  their  influence  upon  each  other.  Essentially,  the  purpose  of 
introducing  the  concept  of  exponential  decay  is  to  characterize  the  decreasing  influ¬ 
ence  as  the  distance  between  two  nodes4  increases.  Using  a  matrix  data  structure, 
the  exponential  decay  is  characterized  by  the  decrease  of  the  off-diagonal  elements. 
For  one-dimensional  problems,  this  decay  gives  a  good  characterization  of  decreas¬ 
ing  influence.  But  for  higher-dimensional  problems,  this  characterization  is  not 
4  Here  we  adopt  the  terminology  node  and  distance  from  the  graphical  representation  of  a  matrix 
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adequate.  For  example,  consider  the  Dirichlet  problem  of  a  Helmholtz  equation 

A u  —  au  =  /,  x  £  [0, 1]  x  [0, 1], 

«  lr  =  g, 

where  a  =  0.00278.  Let  the  grid  size  be  1/6.  Using  an  nine- point  stencil,  we  can 
construct  a  36  x  36  diagonally  dominant,  banded,  positive  definite  matrix.  We  list 
the  first  column  of  the  inverse  matrix  in  the  following  table.  Each  item  in  the  table 
has  been  multiplied  by  a  factor  of  104.  The  superscript  of  each  number  in  the  table 
is  the  row  number  of  the  elements  in  the  inverse. 
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It  is  easy  to  see  that  the  off  diagonal  elements  decrease  in  an  oscillatory  manner, 
since  the  enforced  ordering  has  destroyed  the  topological  relationship  among  these 
variables.  Measuring  the  distance  between  two  nodes  by  the  difference  of  the  row 
and  column  numbers  here  is  not  suitable  to  characterize  the  influence  between  them. 
But  if  we  imagine  a  center  in  the  upper  left  corner  of  this  table,  the  elements  decay 
monotonically  and  exponentially  in  a  wavefront  form. 

After  a  careful  study  of  these  issues  and  counter-examples,  we  have  found  that 
the  abstract  data  structures  vector  and  matrix  prevent  us  from  seeing  important 
features  in  many  physical  problems.  Actually,  in  a  recent  paper,  Demko  [DMS84] 
had  noticed  this  limitation  of  the  matrix  structure.  A  linear  operator  in  finite 
dimensional  space  is  often  a  discrete  approximation  of  a  continuous  operator  for 
some  particular  application.  Instead  of  solving  the  original  problem  in  the  entire 
solution  region  ft  (say  in  R*,  where  k  usually  is  1,  2  or  3),  we  choose  only  a  finite 
number  of  nodes  (or  points)  o\,  02,  •  •  ■ ,  on  in  ft  and  try  to  find  the  solution*-  on  these 
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nodes.  A  discrete  approximation  of  the  original  problem 

.4 1  =  b 


is  then  formed,  where  x  =  {x,}  is  the  set  of  unknowns  defined  on  the  set  of  nodes 
O  =  {o,}  and  A  is  the  discrete  approximation  of  the  original  continuous  operator. 
On  the  one  hand,  x,  is  a  component  of  an  n-dimensional  vector.  On  the  other, 
each  Xi  is  also  associated  with  a  node  o;  of  the  solution  region  Q  in  Rk  or  some 
other  solution  space.  In  an  abstract  vector  space,  the  elements  x,  are  given  a  forced 
linear  ordering  which  in  general  cannot  adequately  represent  their  positions  in  the 
solution  space.  The  corresponding  data  structure  for  the  linear  operator  in  this 
space  is  represented  by  a  rectangular  matrix.  Again,  the  positions  of  the  coefficients 
in  any  one  of  the  rows  or  columns  have  little  relation  with  the  positions  in  the 
solution  space.  Generally  speaking,  these  abstract  data  structures  have  successfully 
represented  the  topology  of  the  problems  in  one  spatial  dimension  and  axe  a  good 
theoretical  tool  for  many  analyses.  But  for  operators  which  are  derived  from  higher¬ 
dimensional  problems,  the  enforced  linear  ordering  of  the  unknowns  in  the  matrix 
structure  has  destroyed  the  proximity  relations  of  the  variables  and  the  compactness 
of  the  operator.  This  is  an  example  of  how  our  thinking  and  theory  has  been 
influenced  by  sequential  filters  which  have  disfigured  many  physical  features  in  a 
particular  application.  Since  sequential  arithmetic  and  two  dimensional  scratch 
paper  were  the  means  to  study  mathematics  a  few  hundreds  years  ago,  it  is  not 
surprising  that  people  proposed  the  matrix  data  structure  for  the  linear  operator 
at  that  time.  Now,  the  parallel  age  has  come.  It  is  the  time  to  free  ourselves  from 
this  filter. 

4.3  Template  Operators 


& 


In  the  course  of  this  study,  J.  Oliger  suggested  finding  a  new  structure  which  would 
preserve  the  topological  structure  of  the  original  problem.  The  discussions  led  to 
a  new  vector  space  —  Template  vector  space  T‘  and  a  new  structure  of  the  linear 
operator  —  Template  operator.  Most  of  the  linear  operators  in  finite-dimensional 
space  are  derived  from  discrete  approximations  of  continuous  operators.  The  main 
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idea  of  these  new  structures  is  to  maintain  the  primary  topological  structure  of  the 
original  problem  in  the  discrete  approximation.  In  this  way,  many  characteristics 
of  the  original  continuous  problem  can  also  be  easily  seen  in  the  finite  dimensional 
approximation. 

Let  Oi,02,-*-,o„  be  n  nodes  in  a  solution  region  ft  which  usually  resides  in 
Rfc,  k  =s  1,2  or  3  5.  These  nodes  usually  are  the  positions  on  which  the  discrete 
approximations  of  a  continuous  problem  are  sought.  Let  O  denote  the  set  of  all 
nodes  o,. 

Definition.  1  A  template 

T  =<  0l,02,-  -,0„  > 

is  a  topological  structure  of  the  set  O  in  which  all  nodes  o,  maintain  the  same 
proximity  relation  with  each  other  as  they  have  in  the  solution  region  ft. 

Intuitively,  T  is  the  pattern  of  the  distribution  of  the  set  O.  For  example,  there 
are  four  templates  in  Fig.  4.1.  They  all  have  the  same  number  of  nodes,  but  they 
are  associated  with  four  different  topological  structures.  The  first  three  come  from 
R1,  R2  and  R3  respectively.  The  second  and  fourth  templates  are  both  from  R2,  but 
they  have  different  topological  relationships  among  the  nodes.  We  consider  them 
to  be  different  templates. 

Given  a  template  T,  construct  n  Cartesian  products  of  R*  and  o,,  5,  =  R*  x  o,, 
i  =  1.2,  •  •  •  ,n,  where  R*  is  an  s-dimensional  vector  space.  Now  define  set 

7*  =  Sx  x  S2  x  •  •  •  x  Sn. 

Each  element  in  7J  consists  of  n  ordered  pairs6: 

(<  Xi,Oi  >,<  x, ,Oj  >,•■•,<  x„,  On  >)• 

If  there  is  no  confusion,  we  may  also  abbreviate  the  notation  as 

{  Xox  ,  Xoj ,  •  •  •  ,  X0n  } , 

5  Actually,  Q  can  exist  in  any  space. 

®To  simplify  the  typesetting,  we  will  not  express  alt  the  following  concepts  by  their  real  topological 
picture,  but  by  a  linear  array  of  n  ordered  pairs. 
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where  x,  or  x0i  axe  3-dimensional  vectors.  In  physics,  x,  or  x0<  can  be  interpreted 
as  the  state  variables  for  node  o,.  Each  x,  or  x0i  is  defined  on  the  node  o,  on  the 
template  T. 

A  template  vector  space  which  we  may  abbreviate  as  template  space  over 
R‘  is  the  set  with  operations,  addition  and  scalar  multiplication,  which  are 
defined  as  follows:  let 

X  =  (<  x\i  °1  >)  <  Xj,  Oi  >,  •  •  •  ,  <  X„,  on  >), 

=  {*01  )  x<>2  i‘"t  xo„  }  t 


y  =(<  yi,Oi  >,<  ya.Oa  >,•••,<  yn,  On  >), 

{y0, » y* » ■  ■  ■  >  y<> „ }  > 


and 


x  +  y  =(<  xi  +yi,0!  >,•••,<  xn  +y„, o„  >), 
—  {xoj  +  ,••• ,  x0n  -+■  y„n  } , 


OX  =  (<  OXj,Oi  >,<  QXj,02  >,•••,<  QX„,On  >), 

=  {aXoj.OXo,,---,©!^}. 

Under  these  definitions,  7^*  is  a  linear  space. 

Each  element  x  €  7^,*  is  called  a  template  vector. 

For  example,  Fig.  4.2.  presents  four  template  vectors  which  are  associated  with 
the  corresponding  template  in  Fig.  4.1. 

Let  £  denote  the  summation  over  all  nodes  o,-  6  O.  Define  the  operation  of 

o.€C 

scalar  product  of  two  template  vectors  x  and  y  as  follows: 

(*,y)r  =  £(x,,,y0i), 
o.eo 


where  (x0,,y0))  is  the  scalar  product  of  two  s- dimensional  vectors.  We  may  in¬ 
tuitively  think  of  this  operation  as  matching  the  two  template  vectors  together, 
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forming  the  scalar  products  for  the  matching  pairs,  and  then  summing  the  prod¬ 
ucts.  By  the  length  (or  norm)  of  a  template  vector  in  this  space  we  mean  the 
quantity 

II  1  11=  \J(x,x)t- 

It  is  easy  to  see  that  T‘  is  a  Euclidian  space  under  this  operation. 

In  order  to  simplify  the  notation,  the  following  discussion  will  assume  s  =  1.  Let 
Tn  denote  T*.  Thus  there  is  only  one  state  variable  on  each  node.  In  the  appendix, 
we  will  remove  this  restriction  and  generalize  to  other  spaces. 

Given  a  template  vector  space  Tn,  a  template  operator  space  over  Tn  can  be 
introduced  as  follows:  let 

Tn  =<o1,o2,...,o„  > 

be  the  template  of  Tn.  Construct  n  Cartesian  products 

Qi  =  ^r»  ^  Oi,i  1,  *  *  ,  Tl. 


Let 


C  =  Qi  xQ2x-x  Qn. 


Each  element  L  €  £  consists  of  n  ordered  pairs 


L  =  [<  Ri,  Ol  >,<  i?2,  02  >,-•*,<  Rn,On  >]/ 


or  simply 

L  =  [Ro^Ro,,-  •  •  ,R0n]i 

where  Ri  or  R0i  is  a  template  vector  in  Tn  associated  with  the  node  o,-  in  the  template 
T  7 

A  template  operator  space  over  Tn  is  the  set  C  with  two  operations  addition 
and  scalar  multiplication  which  are  defined  as  follows:  let 

L\  =[<i?l,Ol  >,<  Rj,  Oj  >,•••,<  Rny  on  >]/, 

—  [Roi  I  R&1 1  ’  '  5 

7 Here  the  subscript  /  of  L  meajis  this  expression  is  a  left  form  of  a  template  operator.  Late,  we 
will  introduce  its  rtght  form 
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and 


Li  =  [<  R\,  <?i  >,  <  Ri,  oi  >,•••,<  Rn,  on  >]/ 
[Rai  1  Rot  l  *  *  *  >  Ron  ]  1 1 


L\  +  Li  =  [<  R\  +  R\i  0\  >,•••,  <  Rn  +  Rni  On  >]/ 

=  [iZoj  +  ^ ,  •  •  • ,  Ron  +  R-on  ]  / , 


aL  =  [<  atRi,oi  >,<  aRi,oi  >,•■•,<  aRn,on  >]/, 

—  [o R0l .  aRo, ,  *  *  • ,  oJ?0n]/. 

Under  these  definition,  £  is  a  linear  space. 

Let 


X  —  {x0,  ,  Xoj  ,  •  •  • ,  x„n  } 

be  a  template  in  7^  and 

L  =  [/2<>j ,  Roi ,  •  •  • ,  -Re,]/ 

be  a  template  operator  in  C.  Define  the  operation  of  L  on  x  as  follows: 


y  =  Lx 

[Ro>  t  Roil  > 

=  {(Jl0l,x)r,(i2o3,x)r,---,(-Ro„,x)r} 

where  (R0i,x)  denotes  the  scalar  product  of  the  template  vectors  R0t  and  x.  We  see 
that  y  is  again  a  template  vector  in  Tn.  Under  this  definition,  L  is  a  linear  operator, 
mapping  Tn  into  Tn.  In  another  words,  L  maps  x  to  y,  and  y  is  the  image  of  x  under 
this  mapping.  L  is  called  a  template  operator  of  the  template  space 

Rot  is  called  an  operating  template  or  input  template  of  the  template  op¬ 
erator  L  for  node  o,.  Let  )  denote  the  component  of  R0<  associated  with  the 
node  Oj  so  that 

R0 ,  =  (<  lo,{° i)i°i  >.<  lo,(oi), 01  >,■••,<  l0,(on),on  >)■ 

/0i(o,)  is  called  the  center  element  of  the  operator  template  for  node  0,. 
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Km 
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U°«) 


U<*) 


^oj  (Og) 


i- - 

*os(°i)  UM  UM 

An  in-web  for  node  5  in  second  template  space 

*<»iM  ^(og) 

M<*») 
<*(«•) 


An  In- Web  for  Node  1  in  Fourth  Template  Space 
Figure  4.3:  Two  in-webs 


Given  an  operating  template  Ri,  we  may  construct  a  directed  graph  in  which 
for  each 


IXoi)  *  0 


we  put  a  path  from  node  Oj  to  node  o,-.  This  graph  is  a  picture  of  how  the  operating 
template  collects  the  information  from  the  nodes  which  have  non  zero  coefficients 
l0,(°j)  and  forms  the  value  of  the  image  y  at  node  o,.  We  call  this  graph  represen¬ 
tation  of  an  operating  template  an  in— web  of  the  template  operator  for  node  o<. 
Figure  4.1  shows  two  examples  of  in-webs.  The  in- web  in  the  first  picture,  with 
four  paths  to  node  5  from  its  neighbors,  corresponds  to  a  5-point  stencil  for  node 
five. 

The  structure  of  the  left  form  of  a  template  operator  for  the  second  template 
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Figure  4.4:  A  template  operator  in  the  second  template  space 
space  in  Fig.  4.1  has  the  following  arrangement: 

Ror  Rof 
Ro<  Rot  Rot 
Ro,  Ro i  Ro3 

By  expanding  Ra>  to  unveil  its  internal  structure,  we  may  obtain  the  the  picture  in 
Fig.  4.4: 

If  we  map  the  template  vector  to  a  conventional  vector  (using  the  same  ordering 
as  the  nodes  have),  then  the  template  operator  corresponds  to  an  n  x  n  matrix  as 
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follows: 


loM  ^01(02) 

lo,(oi)  ^(O^) 


Li  (°n) 

UOn) 


LUot)  U°*)  ...  uojnxn 

It  is  easy  to  see  that  the  operating  template  R0i  for  node  o<  corresponds  to  the  row 
i  of  matrix  A.  Now  let 


=  (<  l»Ol  >,<  0,Oj  0,  On  >), 


C2  =  (<  0,  Oj  >,  <  1, 03  >,  •  •  • ,  <  0,  on  >), 


Cn  =  (<  0,Oi  >,<0,O3  1,Ob  >). 

It  is  easy  to  see  that  {e*,*  =  1,  •  •  •  ,n}  is  the  basis  of  space  Tn. 

Applying  L  to  the  basis,  we  have 

Lei  =  {  •  •  • ,  I»„(o»)} }  i  =  1,  •  •  • ,  n. 

Let  C0,  denote  Le i  =  1,  •  •  • ,  n.  C0,  are  another  set  of  templates  which  can  be  used 
to  represent  the  template  operator  L.  We  call  C0i  the  image  template  or  output 
template  for  node  o,,  because  it  is  the  image  of  e,-  under  the  mapping  L.  We  may 
also  interpret  it  as  the  distribution  of  the  output  for  t  unit  source  at  node  o,  under 
this  mapping.  Again,  0,  is  called  the  center  of  this  image  template  and  /0,(o,)  the 
center  element  of  the  image  template. 

We  can  see  from  the  definition  that  the  image  template  of  the  node  0,  corresponds 
to  the  i-th  column  in  the  corresponding  matrix  we  mentioned  above. 

Analogous  to  the  in-web,  another  graph  called  the  out-web  for  node  0,  may  be 
constructed,  in  which  we  include  a  path  from  0,  to  o;  whenever  I0,(°j)  ^  0  in  an 
image  template.  In  this  graph  all  paths  start  at  node  o,-  and  indicate  which  nodes 
are  directly  influenced  by  the  node  o,.  The  pictures  in  Fig.  4.5  are  two  out-webs  for 
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W<*) 


loi(oa) 


*- - J. - 4 

loi(&i)  los(°z) 


An  out-web  for  node  5  in  second  template  space 

Ux{oi)  ^,(os) 


^oj  (°e) 

«*(*) 


An  Out- Web  for  Node  1  in  Fourth  Template  Space 
Figure  4.5:  Two  out-webs 


the  node  o5  of  the  second  template  space  and  for  the  node  oi  in  the  fourth  template 
space  in  Figure  4.1,  respectively. 

Now  the  right  form  of  a  template  operator  can  be  defined  as: 


or  simply 


L  —  [<  Cl,  Ox  >,  <  Cj,  Oj  C„,  on  >]r 

L  “  [C<>1 )  ^OJ  i'"t  Con]r- 


Then  the  right  product  of  a  template  vector  and  a  template  operator  can  be  intro¬ 
duced  as  follows: 


x  —  y  L  —  y\Oo\ » ,  ■  •  • ,  Ojn]r 

=  {(y>001)r,(y,Ooj)r,”-  »(i/>0«,b)t}- 


It  is  interesting  to  compare  this  operation  with  the  corresponding  operation 
x  =  yA  in  a  matrix  structure.  There,  the  matrix  A  keeps  the  same  form  while  the 
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vector  y  needs  to  be  transposed  as  a  row  vector.  Here,  the  template  vector  y  keeps 
the  same  form,  but  the  operator  needs  to  be  expressed  as  a  right  form  in  order  to 
obtain  an  operation  consistent  with  the  left  form.  The  rule  for  deciding  when  we 
should  use  the  right  or  left  form  of  a  template  operator  is  simple:  if  L  appears  on 
the  right  of  the  operand  ,  the  right  form  is  used  and  vice  versa.  We  will  see  that 
with  these  two  kinds  of  products  the  multiplication  of  two  template  matrices  can 
be  expressed  very  simply. 

Let 

A  [Hflj  ,  >  ^On]l 

=  [Coi  I  C'ojj  •  *  •  )  C0„]r» 

B  -WL./C-./O 

=  |c”„c;.--,cLJ. 

and 

d  =  ab  = 


It  is  easy  to  verify  that 


d  =  ab  =  a[c;,c;,-.,cj, 

=  [AC'n,AC'^---,AC‘Jr 

=  [•RojjHo,,-*-  ,Ron]lB 
=  •  •  ,RonB]i 

K'o,  =Ro,B 

=  R°.[C'0l,---,C'Jr 

C\  —  AC'o, 

—  [-Rot,-'*  ,Ron]lC'0t 
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Each  element  in  the  product  of  two  template  operators  is  a  scalar  product  of  a 
operating  template  of  A  and  am  image  template  of  B.  An  operating  template  of  D 
is  a  left  product  of  the  operating  template  of  A  for  the  same  node  and  the  operator 
1 3.  An  image  template  of  D  is  a  right  product  of  the  operator  A  and  the  image 
template  for  the  same  node.  It  is  easy  to  see  the  relationship  with  the  operation  of 
multiplying  a  row  and  a  column  in  the  matrix  structure. 

The  transpose  of  any  linear  operator  L  can  be  simply  obtained  by  swapping  the 
input  template  and  output  template  for  each  node. 

LT  =  [C'ojjC’o,,  •  •  • ,  CoJj 

=  [iZoi  ,  Ro^i  •  •  •  1  iUr. 


For  a  self-adjoint  operator,  we  have  R0<  =  C0l,o,  €  O. 

Corresponding  to  the  concept  of  diagonal  dominance  in  row  or  in  column  for  the 
matrix  structure,  a  template  operator  is  center  dominant  in  output  or  in  input  if 

I  Uo.)  |>  E  U«i)<  ^  0.6^ 


I  lo,(oi)  |>  E  for  all  Oi  G  O, 

Oj  €0 

Oj?Oi 

is  true. 

Although  we  have  also  developed  other  new  concepts  for  template  operators,  we 
will  not  present  them  here,  since  they  axe  not  directly  related  to  the  discussion  in 
the  next  section.  The  interested  reader  can  refer  to  the  Appendix  B. 

4.4  Estimates  of  the  Decay  of  Inverse  Operators 

In  this  section  we  will  concentrate  on  the  discussion  of  a  decay  phenomenon  for 
the  inverse  of  a  sparse  template  operator.  As  with  the  matrix  structure,  if  there 
are  only  few  non-zero  elements  in  a  template  operator  we  call  it  a  sparse  template 
operator. 
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•  stands  for  first  influencing  wavefront  of  the  V  node 
©  stands  for  second  influencing  wavefront  of  the  *sP  node 
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Figure  4.6:  First  and  second  influenced  wavefronts 


To  describe  the  decay  phenomenon  more  precisely,  we  need  to  introduce  two 
important  concepts:  kth  influencing  wavefront  and  kth  influenced  wavefront  of  the 
node  o,  in  a  template  operator.  Let 


be  the  center  of  the  influenced  wavefront  for  node  o,, 

W£l(o,)  =  {Oj  |  l0](Oi)  ^  0,  Oj  /  o,} 

be  the  set  of  all  nodes  except  node  o,  for  which  the  corresponding  elements  in 
the  output  template  C0,  are  non-zero.  wii!(ot)  is  called  the  immediate  or  first 
influenced  wavefront  of  the  node  o,-.  The  kth  influenced  wavefront  of  the  node  o, 
can  be  defined  recursively: 

Definition.  2  The  kth  influenced  wavefront  of  the  node  Oi,  W^j(oi),w  a  set  of 
nodes  oi  defined  as  follows: 

w£!(«i)  =  U  wffite)  -  U  wSJ(oi) 

*=0 

Fig.  4.6  shows  the  first  and  the  second  influenced  wavefronts  of  a  node  o,. 
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Likewise,  let 


be  the  center  of  the  influencing  wavefront  for  node  o,, 

w£,(og)  =  {Oj  I  l0i(oj)  ^  0,o:  ^  o,} 

be  the  set  of  all  nodes  except  node  o,  for  which  the  elements  in  the  input  template 
R„ .  are  non-zero.  W,^(o,)  is  called  the  first  influencing  wavefront  of  node  o, . 

Definition.  3  The  kth  influencing  wavefront  of  the  node  oit  W,***(o,),w  a  set  of 
nodes  o/  defined  as  follows: 

*£>(<*)  =  U  wSI’K )  -  U  )• 

o,€W<*-,)K)  »=° 

It  is  clear  from  the  definition  that 

wsjwnwaio,)-*, 

vt&wn  = 0, 

and* 

O  =  U  W“(o.)  =  U  wY>(0i)- 

;=°  ;=o 

where  p  ^  n  and  the  p  and  q  are  the  largest  integers  for  which  the  W^|(o<)  and 
W 3X6  n°t  empty.  Here  p  and  q  both  depend  on  node  ot.  To  simplify  the 
notation,  we  will  not  explicitly  express  this  dependence.  For  many  important  appli¬ 
cations,  Wiil(oi)  and  W^o  ,)  are  compact  in  the  sense  that  the  first  influenced  and 
influencing  wavefronts  are  located  in  a  small  area  in  the  solution  space.  For  P.D.E. 
applications,  the  number  of  elements  in  W,-^(o,)  or  Woil(oi)  is  typically  bounded 
by  a  constant  which  is  independent  of  the  mesh  size.  We  often  call  this  property 
the  locality  of  the  operator.  In  terms  of  the  graph  representation,  we  have  the 
following  results  which  will  be  useful  later. 


Here  we  assume  the  operator  is  irreducible 
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Lemma  4.1  The  k-th  influenced  wavefront  is  the  set  of  nodes  to  which  the  shortest 
path  from,  node  o,  is  of  length  k. 

Similarly,  the  k-th  influencing  wavefront  w  the  set  of  nodes  from  which  the 
shortest  path  to  node  o,  is  of  length  k. 

Let  V(o,)  and  U{oi)  be  the  first  influencing  and  influenced  wavefront  for  node  o, 
of  the  fc-th  power  of  template  operator  L.  We  have9: 

Lemma  4.2 

U  w\i\ o.)  =  V(«), 

J=1 

and 

U  W«(o.)  =  W(Oi). 

i= l 


The  two  kinds  of  wavefronts  characterize  how  influences  are  propagated  to  or 
from  other  nodes  graphically. 

The  identical  template  operator  I  is  as  follows: 


•f  —  [I<n  »  ^07  >  *  *  '  >  [L>|  )  -foj  »  ‘  )  -L>n]r 


where 

I«  =  =  t,J  I  o,  €  O}. 

The  template  vector  I0i  is  a  structure  corresponding  to  the  base  vector  in  a  ordinary 
vector  space;  both  have  only  one  element  which  is  1  and  the  rest  are  zeros. 

Let 

L-'  =«,<,• --.JO 
be  the  inverse  of  the  operator  L  where 

K'  ={';'(»<).••  ••'.■> .)>. 

c~'  ={/;■(«.), 


9Here,  we  ignore  the  possibility  of  cancellation  producing  new  zeros. 
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are  the  input  and  output  templates  of  the  node  o,  for  I-1,  respectively.  It  is  easy 
to  see  that  the  input  template  R~l  of  the  inverse  is  a  discrete  Green’s  function  for 
node  o,.  (f?”1,/)  is  the  solution  of  the  equation 

Lx  =  f 

on  node  o,.  Intuitively,  if  I  is  a  finite  approximation  of  a  linear  differential  operator 
and  the  mesh  is  fine  enough  the  following  is  true: 

lG(P,Q)f(Q)drQ 

and 

~  G(oi,Oj)AQ. 

Let 

II  K‘  III  =  E  (C(«i ))’. 

o3eO 

II  c.-'  Ill  =  E  (C>(«.))3 

||  .R"1  llj  approximates  the  norm  of  the  Green’s  function 

JnG(P,Q)2drQ 

if  the  linear  operator  is  a  discretized  linear  differential  equation  and  the  mesh  size  is 
small  enough.  The  norm  ||  R~l  ||  is  bounded  by  a  constant  which  is  independent  of 
the  mesh  size  if  this  finite  approximation  is  stable.  The  structures  of  the  continuous 
and  discrete  operators  axe  more  consistent  for  the  template  space  than  for  the 
traditional  vector  space  for  the  matrix  structure. 

Let  xa  be  the  characteristic  function  on  the  subset  A.  Construct  two  new 
sequences  of  template  vectors 

Qi  *  =  Vw<*)(0i) 0  Ro,1 

=  {n(fc)(o1),n(*)(o2),---  ,n(A)(o/v)}, 
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i  =  1 

=  0  co.1 

=  {m(*)(o1),  m(k)(o2),  •  •  • ,  mw(oN)}, 


k  =  l,---,p, 
i  = 


where  p  and  q  are  the  largest  integers  for  which  W,-^(ot  )  or  W^|(o,  )  axe  not  empty, 
and  N  is  the  number  of  the  nodes  for  this  template  operator.  By  the  above  defini¬ 
tion,  we  have: 


n{k)(oj)  =  | 

\  ';■(»>). 

O,  6  w2>(0,), 

1 

elsewhere , 

m(*>(oj)  =  | 

\  <;>>. 

O;  e  wi2(o<), 

1 0, 

elsewhere. 

In  another  words,  P/k)  only  gathers  the  elements  of  C~f  on  the  fc-th  influenced 
wavefront  and  gathers  the  elements  of  /2”1  on  the  k-th  influencing  wavefront. 
Let 

llo!‘,ll!  =  Z  (VM)1, 

II if1  Ill  =  z  (C, '(».))’• 

We  have  the  following  theorem: 

Theorem  4.1  If  there  is  an  integer  k  >  2  such  that  is  not  empty,  then 

for  any  such  integer  k  the  following  inequality  holds: 

II  «!*’  Ill  <  7*11  r;,'  ll’  (4.2) 

where  7  <  1  and  depends  only  on  the  condition  number  of  the  operator  L.  A 
corresponding  result  for  the  kth  influenced  wavefront  is  also  true: 

II  /f1  ll!  <  7*11  C-'  II’.  (4.3) 
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Proof.  The  proofs  of  (4.2)  and  (4.3)  are  parallel.  Here  we  present  the  proof  for 
(4.2).  From  the  definition  of  an  inverse,  we  have: 

LC 

Construct  two  sequences  of  template  vectors: 

R(k)  =  £  pij) 

>=*+1 

=  {/(*)(o1),/W(o2),...,/<*)(oAr)}, 
wjk)  =LR\k) 

=  {R0l, Ro), •  •  • , R0ff} Ri  ^ 

=  {z(*)(o1)>z(‘)(o3),.-.,zW(oAr)}, 

k  =  !,•••• 


By  the  definition,  we  have: 


0,  Oj  €  U  w2(o,), 

/(*)(„,)  =  I  »=! 

U  »«(«,). 

I  *i=*+l 

0,  Oj  6  U1  K4J(0b-), 

*“'(<>()  =|  (A,,^1).  6  wj2(o.)u  wiir’fo.), 

0,  °;€  6  WjS(«i). 

<i=*+2 

It  is  clear  that  the  non-zero  patterns  of  and  W,  -I*  do  not  overlap.  Now, 

II  *!*'  II  =  I  £  II  /f >  ll’r 

J=*+t 

=  l|£-'£«l*,ll 

<  II  £-■  llll  w;(‘)  II 

<  II  i'1  llll  ~21  n 

=  ||  i-1  mi  || 

<  kl  ||  (fi!11  -  fl!*-11)  || 

<  ll  p™  +  ff'"  II  • 


Id 

1 
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The  rest  of  the  proof  can  be  completed  by  the  following  lemma  [Dem77]: 

Lemma  4.3  Let  {a,},>o  be  a  sequence  of  nonnegative  numbers.  If  there  is  a  K  >  0 
so  that  52.>fca«  <  Kak-i  for  all  k  >  1  ,  then  at,  <  [K/(K  +  1  )]*-*o  for  all  k  >  1, 
where  s0  =  £.>o  a«  ■ 

If  a  matrix  has  only  real  eigenvalues  or  is  positive  definite,  a  stronger  result  can 
be  obtained.  Here  we  present  the  result  for  a  positive  definite  template  operator. 

Theorem  4.2  If  L  is  a  positive  definite  template  operator,  then 


Proof.  The  proof  of  (4.4)  is  rather  easier.  Apply  the  optimal  Chebychev  iteration 
to  the  equation 

LTx  =  I0i 

and  choose  the  initial  guess  =  0,  where  I0i  is  the  output  template  of  the  identity 
operator  for  node  o,-.  We  know  that  the  solution  of  this  equation  is  the  input 
template  R~l  of  L~l  for  node  o,.  The  k-th  iteration  x^  has  the  error  bound 


Since  z^°l  =  0,  z^(o,)  =  0,  o<  €  O  —  V(o,)  and 

il  *!*’  ii  <n  *“>  -  il  ■ 

Using  the  same  argument  with 

Lx  =  Ioi  ? 


we  can  prove  (4.5). 
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These  two  bounds  axe  rather  pessimistic,  but  we  can  still  derive  some  very  useful 
results  from  them.  For  example,  apply  the  Crank-Nicolson  scheme  to  a  parabolic 
boundary  problem 

^+Cu  =  /,  lEfl*  (0,T|, 

lu  =  g,  X  e  rn  X  (0 ,T], 

u(z,0)  =  u°(x), 

where 

£u  + 
lu  =  cu  +  d^, 

a,  >  0,  b,  c  and  d  >  0.  It  is  known  that  the  condition  number  of  the  resulting  linear 
system  has  the  bound  [Kuz87] 

kl  <  Crh~ 2 

where  r  is  the  time  step  and  ft  is  the  spatial  mesh  interval.  If  we  choose  r  ~  ft, 
then  asymptotically,  we  have 

||  R\k)  ||~  Ct~2Vn 

where  d  =  kh  is  the  distance  between  the  node  o,-  and  the  node  in  the  fc-th  wavefront. 
There  is  almost  no  influence  to  node  Oi  from  those  nodes  which  are  a  few  wavefronts 
away.  Thus,  if  we  apply  $  to  this  problem,  the  overlapping  needed  is  very  small. 

For  a  higher-dimensional  problem  where  decay  of  the  norm  of  the  wavefront  does 
occur,  the  average  size  of  an  element  in  the  &-th  wavefront  even  diminishes  faster, 
since  the  number  of  elements  in  k- th  wavefronts  will  increase  when  k  increases. 
The  number  of  elements  in  each  wavefront  is  proportional  to  where  a  is  the 

dimension  of  the  solution  space,  but  the  bound  on  the  rate  of  decrease  in  (4.2), 
(4.3),  (4.4)  and  (4.5)  depends  only  upon  the  condition  number  of  the  operator. 
Because  the  condition  number  of  approximations  to  the  model  problem  is  only 
weakly  related  to  the  dimensionality  of  the  solution  space,  the  individual  elements 
of  the  inverse  operator  decrease  faster  in  a  higher-dimensional  grid,  explaining  the 
faster  convergence  of  fytfd  for  higher- dimensional  problems. 
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Unfortunately,  these  bounds  still  cannot  guarantee  that  the  norm  of  the  k-th 
wavefront  of  the  input  or  output  template  of  the  inverse  will  decay  as  k  increases. 
In  Section  4.2,  we  presented  illustrative  counterexamples  and  explained  how  bounds 
such  as  (4.2)  and  (4.3)  can  permit  growth.  The  conditions  of  sparsity  or  positive¬ 
definiteness  are  not  strong  enough  to  ensure  that  decay  will  occur.  Here  we  will 
present  a  sufficient  condition  which  will  yield  decay. 

For  simplicity,  in  the  following  discussion  we  rescale  the  template  operator  L  so 
that 

LXoi)  =  1,  for  all  o,  €  O. 

If  a  template  operator  is  output  or  input  strictly  center  dominant,  then  we  have 
the  following  result. 

Theorem  4.3  Let  L  =  /  —  B  be  a  sparse  template  operator  and  ||  B  =  7  <  1, 
then 

II  Jf  ’  IL  <  Til  if  IL-  (4.6) 

If  II  B  ||j  =  7  <  1, 

II  <?!*’  IL  <  t||  IL-  (4-7) 

Proof.  First  note  that  ||  B  <  1  or  ||  B  {{x  <  1  are  equivalent  to  saying  that  L 
is  input  or  output  center  dominant,  respectively.  Another  important  fact  for  the 
proof  is  the  following:  if 

Oj  €  W<2(o,), 

from  Lemma  4. 1  we  have 

w£’(oi)C  (J  W&oO- 

l=*-l 

This  means  that  the  input  template  R0)  only  has  non-zero  elements  in  the  (k  —  l)-th 
or  higher  influenced  wavefronts  of  the  node  o*.  Similarly,  if 
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The  proof  of  (4.6)  and  (4.7)  are  parallel.  Let’s  prove  (4.6). 
Vi  we  have 

=  {(**  ’  C0,X)t  ,  {Ron, 

=  L 

and 


(R  0},COil)T  -  +  lo,(Oj)l~l(Oj) 

=  £  lo>{°i)lo.l{°k)  +  K?{Oj) 

°*€w£>(o,) 

_  I  °>  *  /  h 
1  1,  *=>• 

First,  we  prove:  when  k  =  p,  (4.6)  is  true.  Let  o,  be  the  node  in  W^)(ot)  such  that 

<?(«*)  =  II  *w  IL- 

Then  we  have 

(■R0,»C'o.1)r  =  £  L(°j)C/(°*)  +  tfi0:) 

o*€^>(o,) 

=  0, 


I  C(»i)  1=  II  p, w  IL  <  E  l  L(»i)  ll  c.'(°*)  l  ■ 

o*€^)(oy) 

Since  W;,|>(0i)  c  U  W(£"(o,), 


P- 

1  I 


( P ) 


L< 


E  l  UM 

»*€W<:{(0i)  nw^(o,) 

( 


F 
*  » 


(p) 


loo* 


£  I  L(°>)  I  ll  p, tl) 

yo-wZT'Ho^n^o,)  J 

^  II  PiP)  lloo  ^  II  PiP~1)  lloo,  we  could  conclude  that  1  <  7,  a  contradiction.  Thus 
can  replace  ||  P[p)  |L  in  the  right  hand  side  of  (4.8)  by  ||  P’(p‘l)  |L  and  obtain 

II  PtP)  Hoc  <  Til  ^(p-1)  IL. 


(4.8) 


we 
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Using  reverse  induction,  suppose 

II  A'*’  IL  <  7»  P?'"  IL.  i~k  + 1, 

is  true.  Let  oj  be  the  node  in  W<*](ot)  such  that 

Oi)  =  ||  i»“>  il. 

Then  we  have 


(Roj,C~l)r  =  51  lok(oj)l~l(ok)  + 

=  0, 


Since 


we  have 


I  <Z'(oi)  1=  II  if’  IL  <  £  |  L(oj)  II  L'(»») 

wLI,(o,)c  U  w™(oi), 

l-k-l 


( 


p!"'  IL  < 


E  I  L(°>)  I 

o*€{  U 

vmk 


II  1 II, 


+  i  E  i  1 1  ii  /f  ii, 

^wiir^oonw^o,) 


(4.9) 


As  with  in  (4.8),  ||  P-  *  >  ||  P/*  will  lead  to  a  contradiction.  We  therefore 

have 

II  Jf’  IL  <  Til  /**-•>  IL.  *  =  1,- 

The  proof  is  completed. 

Notice  that  ||  B  <  1  does  not  ensure  the  conclusion  of  (4.7).  There  is  a 
mistake  in  a  theorem  dealing  with  a  diagonally  dominant  tridiagonal  matrix  in  a 
recent  paper  of  Rong-Qing  Jia.  He  claims  that  ||  B  <  1  could  yield  a  sharp 


4.4.  Estimates  of  the  Decay  of  Inverse  Operators 


67 


estimate  for  both  columns  and  rows.  A  simple  counter-example  is  the  following 
matrix,  which  is  row  diagonally  dominant  but  not  column  diagonally  dominant: 

’  -56.1  35 

21  -55.1  34 

21  -54.1  33  0 

21  -53.1  32 


21  -35.1  14 

21  -34.1 

The  inverse  of  this  matrix  exponentially  decays  in  its  columns  but  not  in  its  rows. 

If  we  loosen  the  condition  of  strictly  center  dominant  to  center  dominant,  then 
following  are  true: 

II  Jf 1 IL  <  II  ff-”  IL  (4.io) 


or 

II  <?!*’  IL  <  II  <?!*“’  IL  (4.ii) 

We  may  also  derive  a  simple  bound  as  follows: 


Theorem  4.4  Let  L  =  I  —  B  be  a  sparse  linear  operator.  If  ||  B  =  7  <  1,  then 

II  ||<  (4.12) 

1  7 

If  II  B  ||,  =  7  <  1,  then 

^k+ 1 

II  c\k)  ||<  (4.13) 

Proof: :  If  ||  B  <  1,  we  have  L~l  =  ££*  &  and  C"1  =  (E~o  From 

lemma  4.2,  it  follows  that 

A'*1  =  (  £;  b")l 

n=k+ 1 
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Consequently, 


Jf  ’  Ho,  -  il  (  £  B")h, 


n=i+l 


1  -7- 


The  proof  of  (4.13)  is  parallel  with  the  proof  of  (4.12). 

For  an  M- -matrix  A,  there  is  a  diagonal  matrix  D  >  0  such  that  DA  is  strictly 
diagonally  dominant.  Applying  this  result  to  the  corresponding  template  operator, 
the  exponential  decay  law  can  then  be  applied  to  the  new  operator. 

It  is  also  very  interesting  that  for  some  problems  (e.g.  the  five  point  discrete 
Laplace  operator  in  a  rectangle,  which  we  will  analyze  in  detail  in  the  next  chapter), 
the  operator  is  not  center  dominant  in  the  physical  space,  but  is  center  dominant 
in  the  Fourier  space. 

The  concept  of  the  wavefront  also  allows  us  to  discuss  the  exponential  decay  for 
some  random  sparse  linear  operators  without  ordering  the  nodes  since  there  is  no 
ordering  relationship  involved  in  the  definitions  of  the  influencing  and  influenced 
wavefronts. 


Chapter  5 


Model  Problem  Analysis 


The  exponential  decay  law  presented  in  the  last  chapter  has  shown  that  there  is  a 
relationship  between  the  overlap  and  the  convergence  of  SfiM.  A  general  quantitative 
relation  is  very  hard  to  derive  for  arbitrary  cases.  A  common  approach  is  to  analyze 
prototype  model  problems.  In  this  chapter  we  present  spectral  radius  analyses  of  the 
Schwarz  splitting  ( Ss )  for  model  problems  in  one-  and  higher-dimensional  solution 
spaces.  We  have  found  that  the  convergence  speed  of  is  a  function  of  the 
overlap,  the  geometries  of  the  subregions,  the  frequency  of  the  Fourier  component 
and  the  dimension  of  the  solution  space. 

The  relationship  between  convergence  and  the  area  of  the  overlap  has  been 
observed  previously.  Miller  [Mil65]  proved  a  result  for  the  case  of  two  overlapping 
rectangles,  while  Kantorovich  and  Krylov  mentioned  in  their  convergence  proof 
that  the  convergence  rate  is  related  to  the  geometries  of  the  subregions.  They  were 
mainly  interested  in  solving  elliptic  equations  in  irregular  regions;  an  analysis  for 
applications  motivated  by  parallel  processing  and  composite  grids  has  not  been 
carried  out.  Our  analyses  extend  the  earlier  work  in  the  following  respects: 

•  The  number  of  the  subregions  can  be  an  arbitrary  finite  number. 

•  A  quantitative  relation  between  the  convergence  and  the  shapes  of  the  subre¬ 
gions  is  shown. 

•  A  relation  between  the  convergence  and  the  dimension  of  the  solution  space 
is  explored. 

•  For  two-  or  higher-dimensional  solution  spaces,  the  analyses  are  carried  out  in 
Fourier  space.  The  convergence  speeds  for  different  frequencies  are  presented. 
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•  An  analysis  of  higher-order  finite  difference  schemes  is  carried  out. 

These  analyses  have  provided  guidelines  for  implementing  am  efficient  parallel  algo¬ 
rithm  for  the  solution  of  elliptic  PDE’s. 

5.1  One— Dimensional  Case 

There  is  no  practical  reason  for  paralellizing  the  solution  of  a  one-dimensional 
model  problem,  but  the  analysis  of  this  problem  provides  some  results  useful  for 
the  higher-dimensional  cases.  It  also  makes  the  whole  analysis  more  complete. 

The  model  problem  in  one  dimension  which  we  will  consider  is 

/(*)-/(*),  *€(0,1), 

y(0)  =  or;  y(l)  =  0. 

After  discretization  using  a  centered  second  order  method,  the  resulting  linear  sys¬ 
tem  is 

Tnx  =  b,  (5.1) 

where 

Tn  =  Tridiagonal  {l,  -2,  l}„Xn- 

The  fyM  for  solving  this  problem  divides  the  region  into  k  overlapping  subregions 
Qt  i  =  as  shown  in  Figure  5.1.  (To  simplify  the  analysis  we  assume  the 

overlap  pattern  is  uniform.  Similar  conclusions  can  be  deduced  for  more  general 
cases.) 

Let  h  be  the  grid  size,  i  the  length  of  the  overlap  and  tj  the  length  of  every 
subregion.  Then  let  n  +  1  =  /  =  £  and  m  - f  1  =  J.  The  circular  points  in  Figure 

5.1  are  the  boundaries  of  the  subregions.  A  natural  way  to  implement  is  to 
first  guess  some  “reasonable”  initial  values  on  the  artificial  boundaries  and  then  to 
solve  these  subproblems  separately.  Next,  use  the  solutions  of  these  subproblems 
to  update  the  values  on  the  artificial  boundaries  and  proceed  iteratively  until  the 
solutions  on  the  overlapping  regions  converge.  If  we  solve  on  these  subregions  in  a 
natural  order,  each  succeeding  subregion  takes  its  boundary  values  from  the  new 
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Figure  5.1:  One-dimensional  overlapping  grid. 

solution  on  the  previous  subregion.  As  we  have  shown  in  Chapter  3,  this  procedure 
is  equivalent  to  applying  the  block  Gauss-Seidel  method  to  the  Schwarz  enhanced 
equation  (•%£): 

Tm  F n 
Em  Tm  Fm 

Tx  =  x  (5.2) 

Em  Tm  Fm 

Em  Tm 

=  {Tm  ®h  +  Em®Lk  +  Fm®  Uk)x  =  b. 

The  corresponding  block  Gauss-Seidel  iteration  for  this  equation  is  as  follows: 

{Em  ®Lk  +  Tm®  /fc)x(*+I)  =  -{Fm  ®  Uk)x w  +  4.  (5.3) 

The  quantities  above  are  defined  as: 

•  Em-  an  m  x  m  matrix  with  zero  elements  everywhere  except  for  1  in  position 
(1,  m  —  /  4-  1). 
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•  Fm:  sax  m  x  m  matrix  with  zero  elements  everywhere  except  1  in  position 

(m,Z). 

•  Ik',  a.  k  x  k  identity  matrix. 

•  Lk'.  a  k  x  k  matrix  with  zero  elements  everywhere  except  for  l’s  on  the  sub¬ 
diagonal. 

•  Uk :  a  k  x  k  matrix  with  zero  elements  everywhere  except  for  l’s  on  the  su- 
perdiagonai. 

As  we  showed  in  Chapter  3,  (5.1)  and  (5.2)  are  equivalent.  Therefore,  the 
convergence  analysis  of  is  reduced  to  calculating  the  eigenvalues  of  the  block 
Jacobi  matrix  J  =  M~lN  of  the  *$>  where 

M  =  Tm  ®  Ik, 

N  —  Em  ®  Lk  +  Fm  ®  Uk- 

If  we  multiply  out  M~*N  then 

J  —  (I'm  ®  ifc)  l(Em  ®  Lk  +  Fm  ®  Uk) 

=  (TZ1  ®  h)(Em  ®  Lk  +  Fm  ®  Uk) 

=  (TZlEm)  ®  I*  +  (T-xFm)  ®  Uk 
=  Em®Lk  +  Fm  ®  Z7*, 


where  Em  and  Fm  have  almost  all  zero  elements  except  columns  (m  —  i  +  1)  and  /, 
respectively.  The  rank  of  this  matrix  is  clearly  at  most  2 (k  —  1).  After  some  row 
and  column  exchanges  J  can  be  transformed  to  J,  which  is  similar  to  J : 


J  =  UJUT  = 


0(r»-2*)x(n-2Jfc)  C(n-2k)x2k 
®2kx(n—2k)  G?kx2k 


where 

G  =  D'  ®  Ik  +  E'  ®  Lk  +  F'  ® 
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_  ^  .  _  m  +  1  —  / 

(m  +  l)1  (m  4- 1) 

It  is  clear  that  Xj  6  (0  U  Ag),  where  A j  and  Ag  axe  the  eigenvalues  of  matrices  J 
and  G ,  respectively.  Using  matrix  polynomial  theory  we  may  obtain  the  following 
theorem. 


Theorem  5.1  If  a  <  b  then  A g  satisfies  the  following  equation: 

Xq  +  2*  a  *  cos  0  *  Ag  +  a2  —  b2  =  0 
where  the  parameter  0  is  the  root  of  the  following  equation: 

sin((fc  —  1)0) 


cos  0  -+■ 


N 


jV-sin’#!  = 


sin  kd 


(5.4) 


The  proof  of  this  theorem  is  lengthy  and  has  nothing  to  do  with  the  discussion  of 
We  present  it  in  an  appendix. 

Let 

p  =  max  {|  Ag  |}  =  max{|  A  j  |}, 

it  is  easy  to  show  that  p  corresponds  to  the  smallest  root  0*  of  equation  (5.4).  In 
particular,  if  k  =  2 

p  =  b, 

and  if  k  =  3 

p  =  Vb. 

Now  we  can  immediately  observe  some  important  facts  about  SfiM  : 


1.  First,  the  spectral  radius  of  J  only  depends  on  the  number  of  subregions  k 
and  the  overlapping  area  a.  If  both  k  and  a  are  independent  of  the  mesh  size 
h,  then  the  convergence  of  fytf  is  also  independent  of  h.  Figure  5.2  shows  the 
distribution  of  the  roots  0  when  k  =  4, 5, 6, 7,  with  a  =  0.4  and  b  =  0.6  for  all 
four  values  of  k.  As  k  increases,  the  curve  for  the  left  hand  side  of  equation 
(5.4)  remains  the  same,  while  the  frequency  of  the  jumps  in  the  curve  for 
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the  right  hand  side  increases.  We  can  see  that  the  smallest  root  of  equation 
(5.4)  moves  leftward  when  k  is  increased.  This  implies  that  p  increases  if  k  is 
increased.  From  these  pictures  we  can  also  see  that  matrix  G  has  2k  distinct 
real  eigenvalues  when  a  <  b. 

2.  For  the  cases  of  k  =  2  and  3,  we  notice  that  when  the  overlapping  area 
increases,  p  decreases,  and  when  k  increases,  p  increases.  These  conclusions 
also  are  valid  for  the  general  case  ( k  >  2).  We  cannot  give  a  closed  form 
solution  for  k  greater  than  5,  but  the  numerical  results  indicate  that  these 
results  hold.  Figure  5.3  1  shows  the  theoretical  and  computational  values  of 
p.  The  computational  results  (denoted  by  0,  A  etc..)  are  very  well  matched 
with  the  theoretical  values.  This  picture  also  indicates  that  the  conclusions 
we  mentioned  are  general. 

3.  Furthermore,  the  SfeM  of  the  matrix  Tn  has  Property  [You71],  thus  the 
Gauss-Seidel  iteration  can  certainly  be  improved  by  the  SOR  acceleration. 
Since  p  is  known,  the  optimal  relaxation  parameter  can  be  estimated  exactly 
(see  Chapter  6  for  detailed  discussion). 


Increasing  k  =  l/m  corresponds  to  increasing  overlap. 


$ 


;) 


i 

'I 


& 


(t/m)  m  1  -  b 


Figure  5.3:  Theoretical  and  computational  values  of  the  squared  spectral  radius  for 
the  block  Jacobi  iteration  matrix  in  the  1-D  case. 
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5.2  Two—  and  Higher-Dimensional  Cases 


Two-dimensional  model  problems  are  commonly  used  to  test  numerical  methods 
for  the  solution  of  elliptic  PDE’s.  Here  we  use  a  method  which  combines  Fourier 
analysis  with  the  method  used  in  the  last  section  to  analyze  the  application  of 
■SjAf  to  the  two-dimensional  model  problem.  The  same  approach  can  be  applied  in 
higher-dimensional  cases. 

The  Poisson  equation  in  two-dimensions  is: 


d2U 

dx1 


+ 


d2U 
dy 2 
U  |r 


=  g(x,y). 


(x,y)  €  (0, 1)  x  (0, 1),  (5.5) 

(5.6) 


Using  central  differences  we  obtain  a  discretization  of  this  equation: 


Ax  =  b, 


(5.7) 


where 


A  =  Tn®In+In®  Tn. 


This  is  of  the  same  form  as  we  obtained  in  the  one-dimensional  case,  with  h  being 
the  mesh  size  and  n  +  1  =  If  we  cover  (0,1)  on  the  x  axis  with  k  subregions 
as  in  the  one-dimensional  case,  then  the  solution  area  is  covered  by  k  overlapping 
rectangles  as  shown  below  2 . 

If  we  apply  to  these  overlapping  subregions,  then  it  is  equivalent  to  applying 
the  Gauss- Seidel  method  to  the  following 

F'm 

E'm  Wm 


Ax  = 


x  (5.8) 


E'm  Wm  F'm 

E'~ 

=  {Wm  ®  Ik  +  {In  ®  Em)  ®  Lk  +  (In  ®  Fm)  ®  Uk}I  =  b, 

2The  subregions  sue  shifted  upwards  to  improve  visibility  of  the  overlapping  pattern. 
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Figure  5.4:  Two-dimensional  overlapping  grid. 

where 

Wm  ~  Tn  <0  Im  +  In  ®  Tm,  Em  —  In  ®  Em,  Fm  —  In  ®  Fm. 

In  order  to  analyze  the  convergence  of  ,  we  need  to  study  the  spectral  radius 
of  the  block  Jacobi  iterative  matrix  of  the  <$>: 

J  =  M~lN 


where 

M  =  0  Ik,  N  =  (/„  ®  Em)  +  (In  ®  Em)  0  I/fc. 

We  have  the  following  result: 


Theorem  5.2  The  matrix  J  is  similar  to  the  matrix 

D  = 


0(n3-2nfc)x(n’-2nk)  C  (n* -2nk)x7nk 
02nfcx(nJ-2n*)  G2nkx2nk 


where 


G  =  Block  —  diagonal{Di},  t  = 
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D,  =  D\  ®  Ik  +  E[  ®  Lk  +  F[  ®  U k , 


a,  0 

'  0 

A  ' 

'  0  0  ' 

i  A  = 

,  f\  = 

°° 

.A 

0 

0  a, 

sinh  nmdi 
sinh  m6i  ’ 


fli  = 


sinh  (1  —  K)mdx 
sinh  m0, 


cosh  0,  =  2  —  cos -  i  =  1,  •  •  • ,  n  , 

n  +  1 

and  k  =  l/m  is  the  overlap  ratio.  Let  px  be  the  spectral  radius  of  the  Dt,  then  each 
pi  is  the  convergence  factor  for  the  corresponding  Fourier  component  of  the  error 
in  the  approximation. 


Proof.  Let 


U  =  (Xn®  Im)  ®  Ik 

where  Xn  is  an  orthogonal  matrix  whose  columns  are  the  eigenvectors  of  the  matrix 
Tn,  and  U  is  an  orthogonal  matrix.  Note  that  UNUT  =  N.  Then 


J'  =UJUT  =UM~1NUt 
=  ( UMUT)-lN 

=  {((. XnTnXl  ®  Im)  +  In  ®  Tmyl  ®  J*}JV 
=  {(A,  ®  Im  +  In  ®  T*)’1  ®  7*}iV 


where  Dn  is  a  diagonal  matrix  whose  diagonal  elements  are  the  eigenvalues  of  Tn 
We  know  that  there  is  a  mn  x  mn  permutation  matrix  P  such  that 

P(A  ®  B)Pt  =  B®A 

where  A  and  B  are  any  n  x  n  and  m  x  m  matrices,  respectively.  So  we  have 
P(In  0  Em)PT  =  Em  ®  P(/„  0  Pm)PT  = 


P(Dn  ®  /m  +  /„  ®  Tm)Pr  =  /m  ®  A  +  rm  ® 
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Notice  that: 

Q  =Tm®Dn  +  Tm®In 

=  Block-diagonal{tt}nXn , 

fi  =  Tridiagonal{l,ii,l}mxm 
in 

7 •  =  -4  +  2cos( — — ),  t  =  1,  •  •  • ,  n. 

n  +  1 

Let  P  =  P  (g>  /*.  Then  we  have 

J"  =  PJ'PT  as  P{{Dn  8  Im  +  In  8  Tm)~l  ®  Ik}PT PN PT 

=  {(P(Dn  ®  Im  +  /n  ®  ®  /*}{(£«  ®  /„)  ®  Z*  +  (Fm  8  /„)  ®  C/*} 

=  (^-1  ®  /fc){(£m  ®  /«)  <8>  Lk  +  {Pm  8  In)  8  Uk) 

=  [Q~\Em  8  /n)]  8Lk  +  [Q-l(Fm  8  In)}  8  Uk. 

As  in  the  one-dimensional  analysis,  we  can  move  all  of  the  non- zero  columns  to 
the  last  columns  and  the  theorem  follows. 

Since  the  structures  of  these  diagonal  blocks  are  the  same  as  those  analyzed  in 
the  one-dimensional  case,  we  can  find  a  tight  estimate  of  pj,  the  spectral  radius  of 
J,  by  using  theorem  5.1.  But  here  it  is  clear  that 

a,  +  0i  <1,  a,  >  0,  0i  >  0  *  =  1,  •  •  • ,  n 

and  thus  we  cannot  derive  a  closed  form  of  pj  for  general  k,  but  we  may  use  the 
Gershgorin  theorem  to  get  a  very  good  bound  for  pj. 

Corollary  4 

pj  <<*  1  +  di¬ 
ll  we  denote  p  =  it  is  easy  to  estimate  the  asymptotic  bound  for  pj  (as  h  — ►  0): 
Corollary  5  If  k  =  2, 

sinh((l  —  n)pir) 

~  sinh(/i7r) 


PJ  < 


sinh(«/i7r)  -f  sinh((l  —  K)pir) 
sinh(/i7r) 


If  k>  2, 


V 


OVBUP  Pi 


Figure  5.5:  Theoretical  and  computational  values  of  the  squared  spectral  radius  for 
the  block  Jacobi  iteration  matrix  in  the  2-D  case. 


Figure  5.5  3  shows  that  the  estimate  derived  from  the  Gershgorin  theorem  is  quite 
accurate.  The  computational  results  (denoted  by  ©,  A  etc..)  are  very  close  to  the 
theoretical  curve.  Note  that  the  curves  are  the  asymptotic  bounds  of  pj. 

From  this  theorem  and  its  corollaries  the  following  conclusions  can  be  deduced: 


•  The  convergence  rate  of  is  a  function  of  the  overlap  ratio  *.  If  k  is 
constant4,  then  the  convergence  rate  of  ^  is  independent  of  the  mesh  size. 
This  is  where  the  conclusion  about  optimal  complexity  comes  from.  If  an 
optimal  algorithm  is  used  for  solutions  of  these  subregions,  the  total  compu¬ 
tational  work  required  for  achieving  a  fixed  accuracy  is  proportional  to  the 
number  of  discrete  unknowns. 


increasing  t  corresponds  to  increasing  overlap.  Note  that  the  domain  sise  increases  with  increas¬ 
ing  overlap  when  the  subregion  sise  is  held  fixed. 

4This  means  that  *  is  independent  of  the  mesh  sise.  We  also  assume  here  that  the  number  of 
subregions  is  independent  of  h. 


*  *  **— - — —— ^ 
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k  =  0.03 


Figure  5.6:  Three  two-dimensional  overlapping  grids. 

•  The  convergence  rate  of  is  also  a  function  of  the  shape  of  the  subregions, 
specifically,  a  function  of  p.  If  the  subregions  are  thin  in  the  overlapping 
direction  (usually  caused  by  increasing  the  number  of  subregions),  then  the 
convergence  rate  is  slow.  This  suggests  that  we  should  avoid  slicing  the  do¬ 
main  into  many  thin  overlapping  subdomains.  A  multidirection  decomposition 
strategy  is  proposed  in  the  next  chapter. 

•  As  in  the  one-dimensional  case,  the  SeM  of  the  matrix  A  has  property  A^K 
Therefore,  some  classical  acceleration  schemes  can  be  applied. 

•  The  convergence  factor  decreases  monotonically  when  the  frequency  increases. 
Furthermore,  if  the  overlap  is  increased,  the  errors  of  high  frequencies  are 
damped  exponentially  faster  than  for  the  smaller  overlap.  The  picture  above 
shows  three  different  overlapping  grids.  The  corresponding  table  presents  how 
the  convergence  factor  p,-  is  changing  when  the  the  overlap  and  frequency  are 
changed.  The  last  column  lists  the  number  of  iterations  needed  to  reduce 
the  errors  of  the  corresponding  Fourier  components  by  a  factor  of  10s.  An 
important  message  which  cm  be  obtained  from  this  table  is  that  we  should 
combine  the  strategies  of  increasing  overlap  and  using  multi-level  grids.  In 
the  following  chapter  we  will  discuss  the  accelerating  strategies  in  detail. 


5.2.  Two-  and  Higher-Dimensional  Cases 


S3 


Frequency 

i 

K 

Matrix  D, 

Number  of 
iterations 

OCi 

Pi 

Pi 

1 

0.5 

0.3794 

0.3794 

0.5366 

18 

0.25 

0.1987 

0.6754 

0.7684 

44 

0.03 

0.0256 

0.9602 

0.9729 

420 

6 

0.5 

0.0095 

0.0095 

0.0135 

3 

0.25 

0.0037 

0.1555 

0.1574 

7 

0.03 

0.8058*10-“ 

0.8302 

0.8306 

62 

11 

0.5 

0.0002 

0.0002 

0.0003 

2 

0.25 

0.3841*10"“ 

0.0338 

0.0338 

4 

0.03 

0.9612*10_e 

0.7126 

0.7126 

34 

16 

0.5 

0.5067*10-5 

0.5067*10-s 

0.7166*10-5 

1 

0.25 

0.4423*10-7 

0.0076 

0.0076 

3 

0.03 

0.1039*10-8 

0.6141 

0.6141 

24 

21 

0.5 

0.1409*10-7 

0.1409*10-7 

0.1987*10-6 

1 

0.25 

0.5987*10-8 

0.0018 

0.0018 

2 

0.03 

0.1215*10-B 

0.5320 

0.5320 

19 

Table  5.1:  Convergence  factors  for  three  two-dimensional  overlapping  grids. 


The  model  problem  in  a  uniform  p-dimensional  cube  is  as  follows: 
p  pa 

/• 

1=1 

U  |r=  9- 


As  in  the  two-dimensional  case,  the  cube  is  divided  into  k  overlapping  subcubes. 
Figure  5.7  shows  a  3-dimensional  cube  and  its  decomposition.  The  subcubes  are 
shifted  upwards  to  improve  the  visibility  of  the  pattern  of  overlap. 

The  same  approach  is  used  for  this  problem  as  for  the  former  case.  Before 
discussing  the  analysis,  some  notation  needs  to  be  defined.  Let 

=  Tn 

=  tridiagonal  {1,  -2,  l}nxn 

and  I^\n)  be  the  n  x  n  identity  matrix.  If  there  is  no  confusion,  we  will  use  7(1' 
instead  of  /^(n).  We  can  recursively  define  the  matrix5  derived  from  the  model 


5  As  in  the  two-dimensional  case,  a  central  difference  scheme  is  used  here. 
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The  corresponding  SeP  and  its  Jacobi  iterative  matrix  of  the  $3  are  as  follows: 

T^x  =  {WM  ®  Hr-')(k)  +  (J(p-1)(n)  0  Em)  0  Lk  +  (J<p-l>(n)  0  Fm)  0  C/*}x 

=  i, 

J(p)  =  M~lN 

=  (WW  ®  /(P-i>(*))-i((/(P-i)(n)  ®  £m)  ®  I,  +  (J^-i)(n)  ®  Fm)  ®  Vk), 

where 

Wif)  =  T^p-1)  ®  I^\m)  +  /^(n)  0 

A  result  which  is  very  similar  to  that  obtained  in  the  two-dimensional  case  can  be 
obtained  as  follows: 


Theorem  5.3  The  matrix  J ^  is  similar  to  the  matrix 
H  = 


where 


0(n»'-(2n*)*-1)x(fi»'-(2nfc)»'-l)  C(nJ>-.(2n*)P->)x(2n*)P- 

0(2n*)*-»x(n*-(2n*)*-l)  G(2„*)p-i  x(2nfc)P-l 

G  =  Block - diagonal  {Du}, 

Du  =  D?v  0  Ik  +  El  0  Lk  +  F'  0  £4, 


E'  = 


a„  0 

,  d:  = 

* 

0 

* 

_ 1 

** 

11 

1 - 

0 

0 

1 _ 

1 

0 

0 

0  J 

w 

[  0  a,  J 

sinh 


Of,  = 


&  = 


sinh(l  —  K)m^ 


sinhm0„  ’  ""  sinhm^ 

cosh  6V  —  2 P  cos( '  ^?r 

v  =  (M»- 1  •  ,iP), 


), 


n  +  1 

*1,  ■  , »p  =  1,  •  •  •  , n. 

TAe  spectral  radius  of  each  D„  is  the  convergence  factor  for  the  corresponding 
Fourier  component  of  the  error  in  the  approximation. 


Proof.  The  proof  is  completely  parallel  with  the  two-dimensional  case.  We  only 
need  to  change  the  Fourier  transform  matrix  from  A(1'  to  A'(p-1). 

There  is  also  a  corresponding  result  for  the  asymptotic  bound: 


SCHUARZ  SPL ITTING  AMD  TEMPLATE  OPERATORSIU)  STANFORD 
UN IV  CA  CENTER  FOR  LARGE  SCALE  SCIENTIFIC  COMPUTATION 
U  TANG  JUL  87  CLASSIC-87-2  N88844-82-K-8225 


UNCLASSIFIED 


F/a  12/2 
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Figure  5.8:  Three  three-dimensional  overlapping  grids. 

Theorem  5.4  For  the  p-dimensional  model  problem  the  asymptotic  bound  for  the 
spectral  radius  of  the  block  Jacobi  iterative  matrix  of  the  &  is: 

sinh(y/p  -  1/c/iir)  +  sinh( y/p  —  1(  1  -  «)/i7r) 


sinh(>/p  -  1/jt) 


The  following  picture  and  table  present  examples  similar  to  those  presented 
for  the  two-dimensional  case.  The  same  conclusions  can  also  be  found  in  higher- 
dimensional  cases.  If  we  compare  this  table  with  Table  5.1,  an  interesting  observa¬ 
tion  is  that  the  convergence  rate  of  the  higher-dimensional  case  is  faster.  Actually, 
we  can  derive  this  conclusion  directly  from  Theorem  5.4.  A  more  favorable  result 
is  that  the  errors  in  the  higher  frequency  components  damp  even  faster  than  in 
the  two-dimensional  case.  Thus  the  strategy  of  a  multi-level  grid  will  be  more 
successful. 

5.3  Higher-Order  Approximation  Cases 

In  this  section  we  will  discuss  the  convergence  behavior  of  fyM  for  higher-order 
approximations  to  separable  elliptic  PDE’s. 


v  '>■  v  v  ’i 


,v5v  y- 


v  >.-« 
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Frequency 

*  =  J 

K 

Matrix  D.,> 

Number  of 
iterations 

<*«.  j 

Aj 

Pi,j 

1 

0.5 

0.2998 

0.2998 

0.4239 

14 

0.25 

0.1622 

0.6164 

0.6923 

32 

0.03 

0.0218 

0.9518 

0.9627 

302 

6 

0.5 

0.0014 

0.0014 

0.0020 

2 

0.25 

0.3748*10"° 

0.0722 

0.0724 

5 

0.03 

0.9105*10"'* 

0.7689 

0.7690 

44 

11 

0.5 

0.6625*10~s 

0.6625*10“5 

G.9369*10“5 

1 

0.25 

0.6101*10"° 

0.0085 

0.0085 

3 

0.03 

0.1445*10"° 

0.6207 

0.6207 

25 

16 

0.5 

0.3820*10“7 

0.3820*10"7 

0.5402*10"''’ 

1 

0.25 

0.1255*10"“ 

0.0011 

0.0011 

2 

0.03 

0.2383*10"8 

0.5050 

0.5050 

17 

21 

0.5 

0.2869*10"9 

0.2869*10"® 

0.4058*10~9 

1 

0.25 

0.3543*10~1J 

0.1524*10"° 

0.1524*10"° 

2 

0.03 

0.5055*10"li" 

0.4153 

0.4153 

14 

Table  5.2:  Convergence  factors  for  three  three-dimensional  overlapping  grids. 


The  two-dimensional  separable  elliptic  problem  on  the  rectangle  [0, 1]  x  [0, 1] 
may  be  stated  as  follows: 

+  +  +  q2^U  =  y)’ 
u  |r=  g{x,y)- 

where 

Pi(*)iPa(y)  >  £  >  0 

Here  the  unknown  U(x,  y)  will  be  approximated  by  tensor-product  B-splines.  When 
the  Rayleigh-Ritz-Galerkin  discretization  using  this  approximation  is  applied  to  the 
above  equation,  it  gives  rise  to  a  matrix  equation 

Ax  =  b, 


where  the  matrix  A  is  of  the  form 

A  =  Ms  0  Sy  +  Sr  ®  My. 
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If  the  B-splines  we  of  order  k,  then  the  matrices  Mx,  Mv,  Sr,  5y  are  2k  —  1  banded 
matrices  arising  naturally  in  one-dimensional  problems.  Moreover,  Mx ,  Mv  are 
symmetric  positive  definite  and  Sx,  S„  are  symmetric  semidefinite.  We  will  not 
discuss  the  derivation  of  this  system  here.  Interested  readers  can  refer  to  the  paper 
[KW84]. 

Let  the  decomposition  of  the  solution  region  be  the  same  as  in  the  two-dimensional 
case.  After  a  slightly  more  complicated  derivation,  we  may  obtain  the  of  the 
matrix  A  as  follows: 

'  Wi  Fm 
Em  W3  Fm 


Em  Wk-t  F„ 


where 


W,  =  Mi  0  Sy  +  Si  0  My, 
Ei  =  L\i  0  Sy  +  Xji  0  My, 


Fi  =  U\i  0  Sy  AUli®  My. 

Here  the  matrices  A/,  and  Si  axe  the  B-spline  matrices  from  the  subregion  fi,,  while 
the  matrices  Lij  and  Uij  axe  matrices  with  zero  elements  everywhere  except  for 
a  lower  or  upper  triangular  matrix  at  the  position  (l,m  —  /  —  d)  or  (m,  l)  which 
is  related  to  the  boundary  conditions  on  these  artificial  boundaries.  Because  the 
detailed  definitions  of  these  matrices  depend  on  the  particular  approximation,  we 
will  not  discuss  them  here.  An  example  will  be  presented  later. 

In  contrast  with  the  case  in  the  last  section,  we  cannot  use  a  Fourier  transfor¬ 
mation  to  diagonalize  both  matrices  Sv  and  My  here.  Fortunately,  a  generalized 
eigensystem  will  do.  Since  Mv  is  symmetric  and  positive  definite  and  5V  is  symmet¬ 
ric,  there  exists  a  matrix  Z  such  that 

ZTMyZ  = 
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ZTSVZ  =  Dn, 


where  /„  is  the  identity  matrix  and  Dn  is  the  diagonal  matrix.  The  diagonal  elements 
of  £>n  are  the  generalized  eigenvalues  A  of 


5yz  =  A-V/y  z. 


Now  a  similar  approach  can  be  applied  to  analyze  the  spectral  radius  of  the 
Jacobi  iterative  matrix  of  the  SS-  The  matrix  Z  3  Im  can  be  used  to  diagonalize  the 


matrices  ,l/y  and  Sy  in  .4  such  that: 


(ZT  3  In)Wt(Z  3U)  =  X  ®  Dn  4-  S<  ® 

(ZT  3  In)E,(Z  3  /m)  a®  £lt  ®  Dfi  +  X*  ®  /«, 
(ZT  3  In)F,(Z  3  In)  *Uu3D„  +  Uii3ln. 


Let  P  be  the  permutation  matrix  such  that  P(.A  3  B)PT  =  B  3  A,  then 


P(Mt  3  Dn  +  Si  3  U)PT  =  Block  -  diagonal{B , } , 
P{LU  ®  Dn  +  3  /n)Pr  =  Block  -  diagonal  {C,}, 

P(Uu  ®  Dn  +  t/j,  3  In)PT  =  PlocA  -  diagonal {£/,}, 


where  {5,}fflXn  is  a  2d  —  1  banded  matrix,  {C}mxm  is  a  matrix  with  zero  elements 
everywhere  except  a  d  x  d  lower  triangular  submatrix  in  the  position  (1,  m  —  /  -  d), 
{Hi }mxm  is  a  matrix  with  zero  elements  everywhere  except  a  dx  d  upper  triangular 
submatrix  in  the  position  (m,  /).  Now,  following  the  same  approach  as  in  the  proof 
of  Theorem  5.2,  we  can  prove  the  following  theorem: 


Theorem  5.5  The  Jacobi  iterative  matrix  of  the  SS  is  similar  to  the  matrix 


0(„a-2n<iA)x(n*-3rud>)  C(n2-2ndk)x2ndk 


02rwi*x(^-2»w<ik)  Gindkxlndk 


where 


G  =  Block  -  diagonal{Di),  i  =  l,--,rc. 


Di  =  D\  3  h  +  E{  3  Lk  +  K  3  Uk, 
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ft,  0 


0  ft. 


ft,  =s,a-‘a, 

Q.  =  T,BrlUi, 

l 

Si  [  0,  •  •  ,  0,  /*,  0,  •  *  •  ,  0 

l 

Let  pi  be  the  spectral  radius  of  the  A,"  then  each  pi  u  the  convergence  factor  of  the 
error  component  of  the  corresponding  generalized  eigenvalue. 

Here  we  cannot  present  a  general  quantitative  estimate  for  pi,  but  a  similar 
qualitative  result  such  as  the  results  in  the  last  few  sections  is  also  true  in  this 
case.  When  we  increase  the  overlap,  /*  will  move  leftward  or  upwards  in  Si  or  Ti 
(l  =  i/h).  By  the  exponential  decay  law  in  the  last  chapter,  the  norm  of  ft,  and  Q, 
will  exponentially  decay.  If  the  overlap  ratio  *  is  independent  of  the  mesh  size  h. 
then  these  norms  are  also  independent  of  h,  as  can  be  seen  in  the  following  example. 

Strictly  speaking,  the  nine-point  stencil  is  not  derived  from  the  tensor  product 
5 -spline.  Since  the  matrix  derived  from  the  nine-point  stencil  has  the  simplest 
tensor  product  form  and  also  has  higher-order  accuracy,  we  present  it  as  an  example, 
discussing  the  convergence  behavior  when  is  applied  to  this  problem. 

The  matrix  equation  derived  from  the  nine-point  stencil  on  a  unit  square  is  as 
follows: 

(M,  ®  5,  +  5,  ®  Mv)x  =  b , 


where 


=  Tri  diagonal  {1,  —2,  1}, 

=  Tridiagonal  { 1,  4,  1}, 

_  r 
— 

=  Tridiagonal  { 6.  —12,  6}. 
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Frequency 

i 

K 

Matrix  Di 

Number  of 
iterations 

Oti 

Pi 

1 

0.5 

0.3794 

0.3794 

0.5366 

18 

0.25 

0.1987 

0.6754 

0.7684 

44 

0.03 

0.0256 

0.9602 

0.9729 

420 

6 

0.5 

0.0094 

0.0094 

0.0135 

3 

0.25 

0.0036 

0.1547 

0.1565 

7 

0.03 

0.7939*10-“ 

0.8298 

0.8298 

62 

11 

0.5 

Q.1928*10-2 

0.1928*10-2 

0.2726*10“2 

2 

0.25 

0.3481*10"“ 

0.0327 

0.0327 

4 

0.03 

0.8709*10** 

0.7103 

0.7123 

34 

16 

0.5 

0.3950*10"s 

0.3950*10-* 

0.5585*10"* 

1 

0.25 

0.3280*10“' 

0.0069 

0.0069 

3 

0.03 

0.7641*10“* 

0.6086 

0.6086 

24 

21 

0.5 

0.8092* 10“7 

0.8092* 10"7 

0.1 144*  10"* 

1  ^ 

0.25 

0.3088*10-* 

0.0015 

0.0015 

2 

0.03 

0.6097*10"“ 

0.5240 

0.5240 

18 

Table  5.3:  Convergence  factors  for  three  two-dimensional  overlapping  grids  using 
nine-point  stencil. 


Since  Sw  is  an  identity  matrix,  the  generalized  eigenvectors  are  the  same  as  the 
Fourier  components.  Applying  the  above  theorem  to  this  matrix  equation  we  have 

_  sinh  Km&i  _  sinh  (1  —  /e)m0,- 

sinh  m$i  ’  '  sinh  mtf; 

5  —  2  cos  rfr 

eosh^  =  —— -  „  ,  z  =  l,---,n. 

2'*'cos^m 

It  is  not  very  difficult  to  see  that  the  higher-order  approximation  has  the  same 
asymptotic  bound  for  pj  as  in  the  Corollary  2  of  the  last  section.  The  above  table 
lists  the  convergence  factors  for  different  frequencies  and  the  number  of  iterations. 
The  decompositions  are  the  same  as  in  Figure  5.6.  Compare  this  table  with  Table 
5.1.  The  iteration  counts  are  exactly  the  same  except  the  last  one.  But  for  the 
higher  frequency  errors  the  convergence  factors  are  slightly  better.  We  can  also 
prove  this  conclusion  by  comparing  the  Q ,  with  a,  and  d,. 
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Acceleration  of  the  Convergence  and  Numerical 
Experiments 

The  discussion  of  the  last  chapter  ha/i  provided  insight  into  the  behavior  of 
Particularly,  some  possibilities  for  further  improvement  in  the  performance  of  this 
method  have  been  mentioned.  In  this  chapter,  detailed  discussions  of  the  accel¬ 
eration  strategies  are  presented.  Section  1  discusses  the  SOR  (Successive  Over 
Relaxation)  acceleration.  For  the  model  problem,  the  classical  theory  of  SOR  can 
be  applied  here  directly.  Both  theoretical  and  experimental  results  show  that  the 
improvement  is  significant.  In  order  for  a  parallel  algorithm  to  be  efficient,  global 
communications  should  be  avoided  as  much  as  possible.  Here  a  local  relaxation 
scheme  is  discussed,  and  a  general  convergence  proof  for  this  acceleration  is  shown 
providing  a  theoretical  basis  for  the  scheme.  Section  2  discusses  the  application 
of  other  classical  acceleration  schemes.  Since  we  have  obtained  the  eigenstructure 
of  the  iterative  matrix  of  the  plain  many  acceleration  schemes  for  can 
be  analyzed.  Particularly,  optimal  Chebychev  acceleration  is  studied  here.  There 
are  other  powerful  accelerations  schemes,  such  as  preconditioned  conjugate  gradient 
methods  which  are  not  mentioned  here.  This  is  not  a  oversight.  They  are  very  good 
iterative  methods  for  a  conventional  computer,  but  they  require  a  global  informa¬ 
tion  exchange  in  every  iteration  and  introduce  a  lot  of  communication  overhead, 
and  the  parallel  efficiency  degenerates.  If  a  new  technology,  which  can  reduce  the 
high  cost  of  the  global  information  exchange,  appears  in  the  future,  these  CG  types 
of  accelerations  will  certainly  be  very  interesting  for  further  study.  In  Section  3, 
hierarchical  computation  is  discussed.  Combining  it  with  the  other  accelerations 
makes  Stfd  a  competitive  parallel  iterative  method  for  real  applications.  The  re¬ 
maining  sections  discuss  several  other  issues  which  are  important  in  the  use  of 
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for  real  applications,  namely  decomposition  considerations,  solution  methods  for  the 
subregions  and  convergence  checking.  Due  to  lack  of  much  real  experience  on  the 
new  parallel  computers  and  the  rapid  advances  in  hardware  and  software,  further 
studies  are  needed  for  these  topics. 

6.1  SOR  Acceleration  and  Multi-Color  Splitting 

Among  the  many  possible  acceleration  methods,  the  SOR  acceleration  is  an  attrac¬ 
tive  choice.  It  is  easy  to  implement  and  its  theoretical  background  is  well  under¬ 
stood.  The  local  communication  pattern  of  this  method  is  also  an  appealing  feature 
For  parallel  computation. 

As  we  discussed  in  the  last  chapter,  is  actually  the  following  block  Gauss- 
Seidel  iteration  1: 

(Em  0  Lk  +  Tm  0  /*)*(fc+1)  =  ~(Fm  0  Uk)xW  +  b.  (6.1) 

Thus,  an  obvious  choice  for  an  acceleration  scheme  is  the  SOR  acceleration.  We 
can  construct  a  new  approximation  x^k+I\ 

x(fc+1)  =  u/x(fc+1)  +  (1  -  u)xw, 

and  then  attempt  to  choose  am  optimal  relaxation  parameter  u>  to  speed  up  the 
convergence.  Since  the  connections  between  the  subregions  merely  involve  artificial 
boundary  values,  the  relaxations  are  carried  out  only  for  those  boundaries.  Late 
in  this  section  we  will  present  a  general  convergence  result:  for  any  choice  of  the 
relaxation  parameter  between  0  and  2,  this  scheme  will  converge  to  the  true  solution. 
But  as  we  know,  SOR  cannot,  be  successfully  applied  to  am  arbitrary  matrix.  The 
following  famous  example  is  due  to  Kahan  [Kah58]  : 


Here  we  exhibit  the  case  of  a  one-dimensional  model  problem. 
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where  0  <  a  <  1.  This  matrix  is  an  A/-matrix  and  is  positive  definite.  A  short 
calculation  will  show  that  the  optimal  relaxation  factor  is  u  =  1,  which  is  equivalent 
to  saying  that  SOR  acceleration  does  not  help  in  this  case.  Some  further  restrictions 
on  the  iterative  matrix  axe  needed.  For  instance,  the  well  known  property  A  and 
the  consistent  ordering  of  the  iterative  matrix  will  guarantee  a  successful  relaxation 
iteration.  Fortunately,  we  are  able  to  construct  an  algorithm  which  satisfies  these 
restrictions  on  the  iteration  matrix  of  the  multi-color 

Let  us  start  with  some  simpler  cases.  Since  we  have  found  the  eigenstructure 
of  the  iterative  matrix  for  the  model  problem,  the  analysis  of  the  application  of 
SOR  to  the  model  problem  is  straightforward.  As  we  see  from  equation  (5.2),  this 
SfM  has  property  2.  It  is  easy  to  verify  that  natural  ordering  and  red-black 
ordering  of  the  subregions  will  both  lead  to  a  consistent  ordering  in  the  matrix. 
Therefore,  the  classical  analysis  of  the  SOR  theory  can  be  applied  here  directly! 
We  have  calculated  the  spectral  radius  p  of  the  Jacobi  iterative  matrices  for  the 
model  problems  in  any  dimension,  the  optimal  relaxation  factor  can  be  calculated 
from  the  following  formula: 


~  1  +  vT^T1 

Table  6.1  lists  some  comparisons  between  plain  and  its  SOR  acceleration  for  the 
one-dimensional  model  problem.  The  third  column  of  this  table  lists  the  number  of 
iterations  needed  to  reduce  the  error  by  a  factor  of  105  for  plain  ■5^,  while  the  r-'urth 
column  lists  the  same  quantities  for  SOR  acceleration  with  the  optimal  relaxation 
factor.  The  last  two  columns  are  the  experimental  and  theoretical  optimal  relaxation 
factors  for  the  same  cases.  We  can  see  that  they  agree  very  well.  There  is  a  detailed 
discussion  and  the  results  of  many  experiments  are  presented  in  the  paper  [OST86], 
which  we  will  not  repeat  here.  As  we  see  in  this  table,  the  improvement  of  the  SOR 
acceleration  is  significant.  For  the  higher-dimensional  problems,  we  could  also  make 
a  similar  table  using  the  spectral  radius  we  obtained  in  the  last  chapter.  The  next 
several  sections  will  discuss  interesting  issues  which  appear  when  this  method  is 
applied  to  more  general  cases. 

^Property  is  an  extension  of  Young’s  famous  property  A  to  the  block  matrix  case.  Since  we 
consider  only  block  matrices  in  this  chapter,  we  will  later  omit  the  adjective  block. 
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SOR  Acceleration  Results  for  1- 

D  Model  Problem 

Number  of 
subregions 

Overlap 

ratio 

Plain 

Optimal  u 
observed 

Optimal  u 
from  theory 

0.182 


0.095 


0.039 


0.019 


0.427 


0.309 


0.250 


0.175 


0.071 


0.030 


0.333 

44 

12 

1.42 

1.36 

0.220 

70 

16 

1.5 

1.46 

0.167 

97 

18 

1.52 

1.51 

0.083 

204 

26 

1.67 

1.63 

0.333 

88 

17 

1.55 

1.50 

0.190 

169 

26 

1.62 

1.61 

0.167 

193 

26 

1.65 

1.64 

0.083 

397 

36 

1.75 

1.74 

6.1.1  Multi-Color  SAM  and  Consistent  Ordering 


As  we  mentioned  above,  some  restrictions  sue  needed  to  ensure  the  success  of  the 
relaxation  iteration.  A  well  known  candidate  class  of  matrices  is  those  which  have 
property  A^\  or  more  generally,  block  p-cyclic  matrices.  An  n  x  n  matrix  A  is 
p-cyclic  if  there  is  an  permutation  matrix  P  such  that  PAPT  is  of  the  following; 


\% 

[v? 


\M 

.'S»l 


form: 


papt  = 
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-Ai.i  0 
-42,1  ^2,2 


$ 


[  o  0  APtP. 1  Ap,p  J 

But  simply  having  the  property  or  being  a  p-cyclic  matrix  is  not  enough. 
Consistent  ordering  is  also  needed.  A  detailed  discussion  of  these  concepts  is  given  in 
Varga’s  book  [Var62].  For  the  model  problem  which  is  decomposed  in  strip  fashion, 
these  requirements  are  automatically  satisfied.  The  block-tridiagonal  structure  of 
the  iterative  matrix  in  (4.2)  has  property  A W  and  is  consistently  ordered.  But, 
the  inherent  dependence  in  the  natural  ordering  of  the  equation  (5.2)  prevents  an 
efficient  parallel  implementation.  Instead,  red-black  ordering  is  commonly  applied. 
For  a  general  solution  region,  the  decomposition  has  to  be  carefully  implemented 
in  order  to  meet  these  requirements.  The  multi-color  proposed  in  Chapter  2  is 
a  way  to  obtain  a  block  p-cyclic  matrix.  If  we  impose  an  extra  restriction  on  the 
decomposition  such  that: 


ft(0  =  (J  ft*0  =  ft, 

i=i 


where  k  is  the  number  of  colors,  and  is  the  number  of  subregions  in  color  /, 
then  the  blocks  which  correspond  to  a  particular  color  only  need  to  be  connected 
to  the  previous  color  in  the  solution  order.  It  is  not  difficult  to  see  that  the 
for  this  splitting  is  a  consistently  ordered  p-cyclic  matrix.  When  p  =  2,  the  p- 
cyclic  matrix  is  a  block  2— cyclic  matrix,  which  is  usually  called  a  red-black  ordered 
block  matrix.  The  advantage  of  the  multi-color  splitting  is  the  parallelism  inherent 
in  this  decomposition.  Subregions  which  have  the  same  color  can  be  computed 
independently.  In  previous  chapters  we  also  mentioned  the  strategy  of  locating  the 
artificial  boundaries  near  the  middle  of  other  subregions  in  order  to  maximize  the 
reduction  of  the  error  on  these  boundaries.  If  we  group  the  subregions  into  only 
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two  colors,  this  requirement  is  very  hard  to  achieve.  This  is  the  major  reason  we 
are  motivated  to  propose  the  multi-color  splitting. 

6.1.2  The  Sensitivity  of  the  Relaxation  Factor  to  the  Overlap 

The  analysis  of  the  relationship  between  the  spectral  radius  p  and  the  overrelaxation 
factor  shows  that  when  p  is  close  to  1  the  rate  of  convergence  is  much  more 
sensitive  to  changes  in  the  relaxation  factor.  A  slight  perturbation  of  the  relaxation 
factor  can  result  in  a  big  degradation  in  performance.  This  is  not  good  for  practical 
implementations.  Since  the  spectral  radius  is  exponentiadly  related  to  the  overlap, 
the  sensitivity  of  the  rate  of  convergence  drops  dramatically  if  the  overlap  increases 
(see  Figure  6.1).  Although  the  increased  overlap  causes  more  work  in  each  iteration, 
the  total  work  is  still  less  than  it  is  for  a  smadl  overlap.  The  work  per  iteration 
increases  linearly  with  the  overlap,  while  the  spectral  radius  decreases  exponentially. 
In  Figure  6.1  the  relation  between  the  number  of  iterations  and  the  relaxation  factor 
is  shown.  These  results  are  all  for  the  two-dimensional  model  problem  in  a  unit 
square.  We  divide  the  square  into  5  overlapping  subregions.  The  six  curves  in  this 
figure  correspond  to  six  overlap  patterns,  which  have  different  overlapping  ratios. 
As  we  see  in  this  figure,  for  the  smallest  overlap  the  performance  of  the  method  is 
tremendously  sensitive  to  the  choice  of  the  relaxation  factor. 

This  figure  strongly  suggests  a  need  to  increase  the  overlap.  Now  a  natural 
question  to  raise  is  how  to  choose  the  best  overlap  ratio  for  a  given  number  of 
processors.  Let  us  study  the  two-dimensional  model  problem  again.  We  divide  the 
unit  square  successively  into  2,  3,  4,  6,  8,  10  overlapping  subregions.  Then  for  each 
case  we  vary  the  overlap  ratio  from  0  to  0.5.  Using  the  spectral  analysis  in  the  last 
chapter,  we  can  calculate  the  total  work  needed  to  reduce  the  error  by  a  factor  of 
10s.  Figure  6.2  shows  the  relation  between  the  overlap  and  the  total  work  for  these 
six  cases.  From  this  figure  we  can  see  that  although  the  spectral  radius  will  be 
minimum  for  an  overlap  ratio  of  0.5,  in  terms  of  the  total  work  the  optimal  overlap 
ratio  is  somewhat  less  than  0.5.  When  k  decreases,  the  spectral  radius  increases. 
But,  if  the  change  of  overlap  is  small  the  number  of  iterations  needed  for  reducing 
the  norm  of  the  error  by  a  fixed  factor  does  not  change.  Thus,  the  total  work  will 
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Figure  6.1:  The  number  of  iterations  as  a  function  of  the  overrelaxation  factor  u 

decrease  until  the  number  of  iterations  jumps.  That  is  the  reason  why  these  curves 
show  saw-tooth  shapes.  As  the  number  of  the  processors  increases,  however,  the 
optimal  overlap  ratio  will  approach  0.5. 


6.1.3  A  Local  Relaxation  Strategy 

SOR  acceleration  has  a  very  efficient  parallel  implementation,  but  unfortunately,  the 
estimate  of  the  relaxation  factor  still  requires  global  information  exchange  in  gen¬ 
eral.  This  is  a  well  known  problem  which  causes  the  parallel  efficiency  to  degrade. 
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Recently,  a  new  technique3  which  uses  local  relaxation  factors  has  received  some 
attention.  The  motivation  for  this  technique  was  to  try  to  find  a  more  efficient  re¬ 
laxation  technique  for  irregular  meshes  or  variable  coefficient  problems  which  could 
avoid  estimating  of  the  spectral  radius  of  the  Jacobi  matrix  and  obtain  more  error 
reduction  than  the  uniform  relaxation  factor.  The  reasoning  behind  this  technique 
is  very  convincing.  Since  the  relaxation  is  a  local  operation,  the  relaxation  factor 
should  also  be  characterized  well  by  local  features.  Experience  has  shown  that  this 
idea  works  well  for  many  test  problems.  Of  late,  the  locality  of  the  communication 
in  this  method  has  obtained  the  attention  of  the  parallel  computation  community. 
C.  Kuo,  B.  Lever  and  B.  Musicus  [KLM86]  apply  this  idea  to  a  mesh— connected 
array. 

The  basic  idea  of  the  local  relaxation  method  is  to  determine  a  relaxation  factor 
for  each  individual  grid  point.  Consider  a  five-point  difference  equation 

OgXg  •(■  -f  ^  "f  o0x<>  — 

where  x0  is  a  grid  function  located  at  the  position  (*,>)  4,  x„  xn,  x„  and  xw  are 
the  grid  functions  located  to  the  south,  north,  east  and  west  of  x„,  respectively,  and 
a0,  a,,  a,,  a.,  are  the  corresponding  coefficients.  Suppose  that  there  are  N 
and  XI  grid  points  in  the  row  i  and  column  j  in  which  x0  is  located.  Now  we  may 
imagine  that  there  is  an  N  x  M  rectangular  grid  and  that  each  grid  point  has  the 
same  difference  equation  as  x0.  Then  the  spectral  radius  of  the  Jacobi  matrix  for 
this  problem  is 


Pr  =  -  [VS^cos  —  +  ^cos^^j  . 

Therefore,  the  optimal  relaxation  factor  for  this  imaginary  rectangular  grid  is: 


We  will  use  this  u as  the  local  relaxation  factor  for  grid  point  x0.  We  can  obtain 
different  relaxation  factors  for  each  grid  point  which  are  only  related  to  the  local 

3lt  la  Also  called  the  ad-hoc  SOR  method.  See  [Erh81],  [Erh84] 

4The  solution  region  need  not  be  rectangular. 
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features  surrounding  the  point  in  this  way.  When  the  solution  region  is  rectangular 
and  the  coefficients  of  the  PDE  are  constant,  the  local  relaxation  factor  is  the 
same  as  the  global  optimal  relaxation  factor.  Experimental  results  show  that  this 
relaxation  scheme  is  successful  for  many  test  problems. 

The  same  idea  can  also  be  applied  to  the  SOR  acceleration  for  The  calcu¬ 
lation  of  the  spectral  radius  for  the  model  problem  can  be  easily  generalized  to  a 
general  second  order  elliptic  PDE  with  constant  coefficients.  As  we  have  seen  in  the 
last  chapter,  the  estimate  of  the  spectral  radius  only  involves  the  overlap  ratio  and 
information  about  the  shape  of  the  subregions,  both  of  which  are  local  information. 
If  we  want  to  estimate  a  local  relaxation  factor  for  an  artificial  boundary,  which 
is  located  in  some  other  subregion,  we  may  imagine  two  overlapping  rectangular 
subregions  and  let  the  shapes  of  these  rectangular  regions  be  as  close  to  the  real 
ones  as  possible.  Then  we  may  use  the  estimate  of  the  relaxation  factor  for  the 
rectangular  regions  as  the  relaxation  factor.  Thus,  global  information  exchange  can 
be  avoided.  In  the  next  subsection  we  will  prove  that,  for  any  choice  of  u>  between 
0  and  2  for  each  subregion,  the  iteration  will  converge. 

This  local  relaxation  method  has  been  successful  experimentally.  Theoretical 
analysis  of  the  relationship  between  these  relaxation  factors  and  the  convergence 
rate  remains  a  very  interesting  open  problem. 


6.1.4  The  Convergence  Proof  for  Local  Relaxation 


In  Chapter  2  a  multi-color  for  elliptic  PDE’s  was  introduced  and  an  extension  to 
a  positive  definite  matrix,  called  multi-color  •?>,  was  mentioned.  Here  the  detailed 
definition  of  this  splitting  is  presented.  Combining  it  with  the  local  relaxation 
method,  we  may  prove  the  following  theorem: 


Theorem  0.1  When  the  multi-color  is  applied  to  a  positive  definite  matrix,  if 
every  which  is  used  as  the  relaxation  factor  for  block  j  of  color  i,  satisfies 


0  <  wj0  <  2, 


then  the  relaxation  process  of  the  multi-color  $S  converges. 


ws 
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Proof.  The  basic  idea  of  this  proof  was  given  three  decades  ago  by  Ostrowski  [Ost56], 
who  used  the  same  idea  to  prove  convergence  of  the  group  relaxations.  The  new 
aspect  here  is  that  we  combine  two  old  techniques,  fyM  and  group  relaxation,  to 
obtain  an  efficient  parallel  implementation. 

Let 

Ax  =s  b  (6.2) 


be  the  linear  system  of  equations,  where  A  is  a  positive  definite  matrix.  The  multi¬ 
color  %  with  local  relaxation  can  be  described  as  follows:  First  we  find  p  permu¬ 
tation  matrices  P/,  /  =  1,  •  •  •  ,p  such  that  matrix  A  can  be  permuted  to  p  different 
partitioned  matrices,  where  p  is  the  number  of  colors: 


A(1) 

a<»> 

^1,2 

AW  1 

A(1) 

A(1) 

-^2,2 

A(1) 

A2  M 

•  • 

PI 

=  P1A{1)P? 

AW 

Akltl 

A(1) 

Aku2 

•  • 

AD 

AklM  J 

A (P) 

Ap) 

"1,2 

Ap)  1 

Al,kp 

Ap) 

A2,l 

Ap) 

"2,2 

Ap) 

A2,kp 
•  *  • 

PI 

=  P„A(p)Ppr 

4P\ 

A (p) 
Ak„  2 

KP'*P  J 

Local  SOR  relaxation  is  then  applied  to  each  of  the  blocks  A\J  as  follows: 


aQxT>  =  «J»  -  £  43*T  ’  -  E  +  (1  -  -IVS#'. 


The  motivation  for  this  algorithm  came  from  the  fact  that  the  original  block  relax¬ 
ation  (or  group  relaxation)  still  suffered  from  slow  convergence.  After  we  studied 
the  inverse  structure  of  the  sparse  matrices  in  Chapter  3,  we  noticed  that  the  er¬ 
ror  decay  rates  for  different  variables  in  the  same  block  differed  greatly.  The  plain 
block  relaxation  failed  to  take  advantage  of  the  exponential  decay  of  the  inverse  of 
a  sparse  matrix.  The  multi-color  •?>  tries  to  put  every  variable  within  the  fast  decay 
area  of  some  color  (geometrically,  we  may  say  “near  the  middle  of  some  subregion”). 


6.2.  Other  Classical  Acceleration  Schemes 
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As  we  show  in  the  model  problem  cases,  it  eventually  yields  a  method  which  has 
optimal  complexity. 

The  proof  of  convergence  is  straightforward.  It  is  well  known  that  for  each 
positive  definite  operator  there  is  a  functional 

F(X)=(AX,X)-2(b,X) 

which  corresponds  to  the  system  of  equations,  such  that  the  vector  which  achieves 
the  minimum  of  this  functional  is  the  solution  of  this  system.  The  iteration  process 
can  be  viewed  as  the  process  of  minimizing  this  functional. 

Let  us  consider  each  calculation  of  a  block  as  one  step  of  the  algorithm.  It  is 
easy  to  verify  that  the  decrease  in  the  value  of  the  functional  for  two  consecutive 
iterations  is  as  follows: 

F(X<*+1))  -  F(X,(k))  =  -J,n(2  -  u/J°)(rJ*\  A|3"1ri*)) 

where  rjk)  is  the  residual  of  block  i  in  color  l.  If  every  u>f‘*  satisfies  0  <  w) <  2  ,  the 
sequence  of  F(X-k))  monotonically  decreases.  Using  arguments  of  Ostrowski,  we 
can  prove  that  this  sequence  will  converge  to  the  solution  of  (6.2).  This  concludes 
our  proof  of  convergence. 

6.2  Other  Classical  Acceleration  Schemes 

Applying  $  to  equation  (3.12)  we  have 

M,x(k+l)  =  N.x{k)  +  /, 

=  G.z(k)  +  AC1/, 

where  G,  =  M~l  N,.  This  is  a  typical  form  of  the  baste  iteration.  Many  acceleration 
schemes  for  this  iteration  are  available.  There  is  an  excellent  survey  and  comparison 
for  them  in  L.  Hageman  and  D.  Young’s  book,  Applied  Iterative  Methods.  We 
will  not  repeat  their  comparison  here.  As  we  mentioned  in  the  beginning  of  this 
chapter,  some  of  these  methods  are  not  discussed  here  due  to  the  high  cost  of 
the  communication  overhead  in  a  parallel  computer  environment.  Among  these 
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different  schemes,  the  popular  Chebychev  acceleration  is  rather  interesting  in  this 
case.  The  adaptive  Chebychev  procedure  still  suffers  from  the  high  cost  of  the  heavy 
global  information  exchange  in  each  iteration,  but  if  we  have  some  knowledge  of  the 
eigenvalues  of  the  iterative  matrix  G ,,  then  optimal  Chebychev  acceleration  is  very 
attractive.  Particularly,  for  the  model  problem  a  detailed  analysis  of  the  eigenvalues 
for  the  $S  is  available.  The  application  of  the  optimal  Chebychev  acceleration  to 
the  model  problem  is  therefore  straightforward.  Let 

m(A)  =  min  A*, 

M(A)  =  max  A*. 


From  Chapter  4,  we  have 


m(G.)  =  -M(G.) 


\  ^  sinh(/c^iir)  ■+■  sinh((l  „ 

M(c->  - - - •  (6'3) 

The  test  results  in  the  last  chapter  show  that  the  bound  (6.3)  is  very  accurate5. 
Moreover,  all  eigenvalues  of  the  matrix  G,  are  real  provided  that  the  overlap  ratio 
k  <  0.5®.  Applying  the  estimate  (6.3)  to  the  classical  formula  for  the  convergence 
rate  for  optimal  Chebychev  acceleration,  we  can  expect  this  acceleration  to  yield  an 
improvement  similar  to  that  offered  by  the  SOR  acceleration.  Unfortunately,  this 
result  cannot  be  generalized  to  the  other  cases  as  the  SOR  acceleration  can.  First 
of  all,  there  is  no  local  Chebychev  acceleration  available,  and  the  classical  adaptive 
Chebychev  procedure  requires  an  extensive  global  information  exchange.  Secondly, 
the  eigenvalues  of  the  iterative  matrix  G,  can  be  complex  in  general.  Chebychev 
acceleration  can  only  be  applied  to  some  of  the  complex  eigenvalue  cases.  How  to 
apply  the  Chebychev  acceleration  to  a  general  problem  is  still  an  interesting  open 
problem  in  some  sense. 


JIf  the  result  in  Theorem  4.1  is  used,  the  exact  value  of  M(G,)  can  be  estimate  by  some  numerical 
computations 

®There  is  no  advantage  in  making  *  >  0  5.  We  need  not  consider  this  case. 
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During  the  last  ten  years,  very  important  advances  in  computational  science  have 
been  made  in  the  area  of  the  hierarchical  computation.  Among  the  best  known 
techniques  are  multigrid  techniques  [Bra77],  adaptive  grid  met  hods  [Oli  8  4],  hierar¬ 
chical  information  flow[01i86].  Although,  their  approaches,  theoretical  foundations 
and  applications  axe  very  different,  one  idea  behind  these  techniques  is  the  same. 
According  to  the  particular  application,  the  computational  process  is  decomposed 
into  several  different  phases,  regions  or  grids,  which  we  will  abstract  as  a  hierar¬ 
chy.  Instead  of  using  one  uniform  approach  for  the  whole  problem,  we  treat  each 
component  of  the  hierarchy  separately,  attempting  to  choose  the  most  efficient  way 
of  obtaining  the  result  in  that  component.  The  components  of  the  hierarchy  will 
communicate  with  each  other,  and  after  some  assembling  or  iterations,  the  final 
result  can  be  obtained  in  a  very  efficient  way.  To  use  a  business  expression,  we 
might  say  that  we  are  only  willing  to  pay  what  we  have  to  pay.  This  same  philoso¬ 
phy  can  even  be  applied  to  the  design  of  the  computer  hardware  and  programming 
languages.  If  the  designs  of  the  computer  and  language  are  “smart”  enough,  it  is 
certainly  worthwhile  to  run  an  algorithm  in  such  a  way  that  in  the  different  stages 
of  the  computation  different  precision  of  arithmetic  are  used.  There  is  no  point 
in  using  double  precision  when  the  iteration  has  just  started.  The  same  idea  can 
certainly  be  adapted  to  the  acceleration  of  In  Chapters  2  and  4  we  have  stud¬ 
ied  the  convergence  rates  of  for  different  frequencies  for  both  continuous  and 
discrete  cases.  An  important  observation  from  the  analyses  is  that  the  slow  conver¬ 
gence  is  caused  by  the  low  frequency  errors.  Table  4.1,  which  lists  the  number  of  the 
iterations  required  for  reducing  the  error  corresponding  to  particular  frequencies  by 
a  factor  of  10®,  strongly  suggests  that  we  should  start  the  computation  at  the  coarse 
grid.  After  the  low  frequency  errors  converge  to  the  truncation  error  level  at  this 
grid,  we  should  then  refine  the  grid  and  continue  the  computation.  This  procedure 
can  be  recursively  repeated  until  the  results  of  the  desired  accuracy  are  obtained. 
We  have  used  the  model  problem  to  test  this  idea.  Our  results  show  that  the  cost 
is  substantially  reduced. 
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The  Numbers  of  Iterations  in  Each  Grid  Level 


Number  of 
subregions 


10 


Grid  Levels  (5  is  the  Finest  Grid)  'i  The  Equivalent  Numbers  of 

Iterations  on  the  Finest  Level 


112  13  4  5 


Total 


3  1111 


1.5 


7  3  2  2  2 


16 


2.7 


Table  6.1:  Hierarchical  computations 


The  test  problem  is  the  two-dimensional  model  problem  on  a  unit  square.  The 
finest  grid  is  [320  x  320].  There  are  5  grid  levels.  The  mesh  size  of  each  level  is 
double  that  of  the  previous  one.  The  iteration  starts  at  the  coarsest  grid.  After  the 
iteration  converges  to  the  level  of  truncated  error  at  this  particular  grid  size,  we 
refine  the  grid  and  continue  the  iteration  on  the  next  finer  grid,  and  so  on.  Table 
6.1  lists  the  number  of  iterations  carried  out  on  each  grid  level  for  two  different 
decompositions.  The  last  column  lists  the  total  work,  measured  as  the  equivalent 
number  of  iterations  on  the  finest  grid.  Although  the  total  number  of  fyM  iterations 
remains  the  same  as  it  would  have  been  for  a  single  fine  grid,  The  total  work  needed 
is  reduced  to  a  small  fraction  of  what  it  would  have  been. 


6.4  Decomposition  Considerations 

In  the  early  1950’s,  Kantorovich  and  Krylov  had  noticed  that  the  way  the  solution 
region  was  decomposed  would  affect  the  rate  of  convergence.  In  our  analyses  of 
the  model  problems  we  have  seen  that  the  rate  of  convergence  is  a  function  of  the 
overlap,  the  shape  of  the  subregions,  the  frequency  of  the  errors  and  the  dimension  of 
the  solution  regions.  The  first  important  issue  in  the  consideration  of  decomposition 
is  the  overlap.  For  the  model  problem,  the  overlap  can  be  characterized  by  a 
simple  quantity  k  (overlap  ratio).  But,  in  a  general  application,  k  can  no  longer 
be  used  for  this  purpose.  In  Chapter  5,  the  exponential  decay  law  was  seen  to 
be  the  reason  for  the  success  of  S$d.  From  this  law,  we  recognize  that  the  rates  of 
convergence  for  different  variables  (or  grid  points)  in  iteration  are  very  different. 
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The  errors  on  the  artificial  boundaries  will  affect  the  error  in  each  point  by  the 
influence  wavefront  of  the  out-web  in  the  inverse  operator.  Ideally,  we  would  like 
arrange  the  subregions  in  such  a  way  that  all  the  artificial  boundaries  are  located 
in  the  “Aeart”  of  smother  subregion  or  subregions7,  in  order  to  maximize  the  decay 
of  the  error  in  each  iteration.  For  the  model  problem,  this  can  be  easily  achieved 
by  setting  the  overlap  ratio  k  to  0.5.  In  general,  we  need  the  multi-color  splitting 
to  achieve  this  goal. 

Another  related  issue  is  the  shape  of  the  subregions.  For  the  model  problem, 
the  ratio  mjn  is  a  very  important  influence  on  the  convergence  rate,  where  m  and  n 
are  the  height  and  width  of  the  subregion,  respectively.  If  we  would  like  to  partition 
the  solution  region  into  many  subregions,  we  should  not  dissect  the  region  in  only 
one  direction(such  as  in  the  strip  case).  A  one-direction  dissection  would  result  in 
having  many  thin,  long  subregions,  leaving  the  artificial  boundaries  very  close  to 
the  boundaries  of  those  thin  subregions.  This  principle  is  also  applicable  to  general 
cases.  The  subregions  should  have  comparable  dimensions  in  every  coordinate. 
Any  small  width  in  one  coordinate  will  result  in  a  short  influence  wavefront  in  the 
inverse  of  the  operator  on  the  thin  subregion,  causing  slow  convergence  as  we  saw  in 
Chapter  3.  A  good  way  of  decomposing  the  solution  region  is  to  dissect  the  solurion 
region  in  k  directions,  where  k  is  the  dimension  of  the  solution  region.  Figure  6.3 
shows  a  dissection  in  two  directions  for  a  two-dimensional  problem.  A  comparison 
is  carried  out  for  two  kinds  of  dissection.  The  first  case  decomposes  the  square  into 
32  thin  strips.  The  second  one  is  to  decompose  the  square  into  36  rectangles.  The 
first  one  needs  40  iterations  while  the  latter  only  needs  15  iterations.  Even  though 
the  latter  case  has  almost  twice  as  many  variables  in  comparison  to  the  former  case, 
the  overall  work  in  the  two  direction  dissection  is  only  three  fourths  of  the  other 
one. 

Another  interesting  issue  in  the  decomposition  of  the  solution  region  is  the  au¬ 
tomation  of  the  dissection.  In  principal,  this  problem  is  similar  to  the  grid  genera¬ 
tion  problem  for  the  finite  element  method.  There  is  no  intrinsic  difficulty  in  this 

7The  motivation  ia  to  put  all  the  artificial  boundaries  in  the  quickly  converging  zone.  Then,  by 
the  maximum  principle,  after  one  iteration  the  total  error  will  be  bounded  by  the  error  on  these 
boundaries. 
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Figure  6.3:  A  dissection  in  two  directions 


problem. 

The  last  issue  is  the  mapping  of  the  decomposition  to  a  parallel  computer  ar¬ 
chitecture.  Since  the  efficient  communication  pattern  of  the  target  computer  will 
strongly  affect  the  choice  of  the  decomposition,  it  is  a  very  hardware  dependent 
issue.  We  did  not  include  such  mappings  in  this  study,  but  if  we  would  like  to  make 
a  really  competitive  method,  they  should  be  carefully  treated. 

6.5  Solution  Methods  for  the  Subregions 

The  choice  of  methods  for  the  solution  of  the  problem  on  the  subregions  is  also  an 
important  issue  for  applications  of&^f.  Because  of  the  inherent  modularity  in  the 
algorithm,  each  subregion  can  be  solved  using  a  different  method.  Depending 
on  the  particular  application,  we  may  take  advantage  of  this  flexibility.  For  example, 
we  can  use  a  fast  solver  or  even  am  analytic  method,  to  compute  the  solution  on 
a  regular  subregion.  Direct  and  iterative  methods  each  have  their  own  advantages 
and  disadvantages.  Iterative  methods  are  generally  preferred.  This  is  because,  at 
any  step,  the  result  from  the  last  iteration  is  a  very  good  initial  guess  for  the  next 
iteration.  But  this  does  not  mean  that  iterative  methods  always  win.  We  have 
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compared  the  multigrid  method  with  the  fast  solver  for  the  model  problem.  In  fact, 
the  fast  solver  wins  in  the  timing  comparison.  Comparison  of  the  two  programs 
will  show  that  this  result  should  not  be  a  surprise.  The  multigrid  method  has  more 
overhead  than  the  fast  solver  has.  The  advantage  in  complexity  for  the  multigrid 
method  only  becomes  dominant  in  a  very  large  problem.  For  sparse  problems,  if 
there  is  enough  memory  for  storing  the  LU  decomposition  in  each  processor,  a  sparse 
solver  cam  also  be  very  competitive.  The  complexity  of  the  work  in  each  iteration 
is  only  O(N)  in  this  case,  where  N  is  the  number  of  unknowns. 

A  strategy  of  incomplete  solution  in  solving  on  the  subregions  haw  been  tried. 
The  basic  idea  of  this  strategy  is  that  we  readly  do  not  need  a  very  accurate  so¬ 
lution  on  the  subregions  in  the  early  iterations.  If  am  iterative  method  is  used  for 
the  solution  on  the  subregions,  we  cam  aisk  if  we  cam  stop  the  iteration  at  some 
point  before  the  solution  converges?  The  preliminairy  results  are  disappointing.  For 
example,  we  have  applied  the  multigrid  method  to  solve  the  subproblems.  If  the 
number  of  V-cycles  or  W- cycles  for  solving  the  subproblems  is  reduced,  the  rate 
of  convergence  of  immediately  degenerates.  The  total  work  needed  to  converge 
is  also  increased.  G.  Rodrigue  has  ailso  haul  a  similar  experience.  Further  study  is 
needed  on  this  question.  It  seems  likely  that  a  way  cam  be  found  to  successfully  use 
incomplete  solutions. 


6.0  Convergence  Checking 


Until  communication  cost  became  am  important  factor  for  the  performance  of  a  par¬ 
allel  algorithm,  convergence  checking  was  never  an  efficiency  issue  in  implementing 
an  iterative  algorithm.  Due  to  the  requirement  of  global  information  exchange 
and  control,  the  convergence  check  in  an  iterative  adgorithm  has  to  be  carefully 
implemented.  The  granularity  of  is  very  desirable  in  this  aspect  since  coarse 
granularity  results  in  a  low  frequency  of  convergence  checking. 

In  addition,  the  hierarchical  computation  in  can  also  be  used  to  reduce  the 
cost  of  a  globed  convergence  checking.  It  is  clear  that  we  do  not  need  a  global  check  of 
convergence  until  the  finest  grid  is  reached.  During  the  computation  on  the  coarser 
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grids,  we  need  only  check  the  local  error  for  each  subregion  (or  processor).  Each 
subregion  will  keep  checking  two  things:  the  local  error  and  the  maximum  update 
of  the  variable  in  the  subregion.  If  the  local  error  is  good  enough  for  a  particular 
subregion,  the  corresponding  process  can  be  put  to  sleep  8.  If  the  maximum  update 
is  large,  then  any  of  the  neighbor  processes  which  are  sleeping  should  be  awakened, 
since  the  boundary  values  of  these  neighbors  have  changed  significantly.  All  these 
information  exchanges  of  convergence  information  are  local  and  cam  be  combined 
with  the  exchamge  of  boundary  values.  There  are  no  extra  communication  requests 
required  for  the  exchamge.  Thus,  the  cost  of  global  checking  will  only  be  required  on 
the  finest  grid.  As  our  experience  shows  that  only  one  or  two  iterations  are  needed 
on  this  grid,  the  overall  cost  of  communication  is  greatly  reduced.  In  general,  the 
global  error  checking  is  still  a  very  interesting  research  problem  for  any  iterative 
method  in  a  parallel  computer  environment.  Some  hardware  design  considerations 
can  be  very  helpful  in  resolving  the  efficiency  problem.  For  example,  if  the  control 
processor  can  check  a  built-in  flag  in  each  processor  at  a  very  low  cost,  then  the 
cost  of  the  checking  can  be  substantially  reduced. 


*In  »  time  sharing  system  the  processor  on  which  the  sleeping  process  was  running  can  be  recycled 
by  the  system 
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Conclusion 


This  thesis  has  reexamined  and  generalized  an  old  mathematical  technique  — 
Schwarz  alternating  method  ($$£)•  Through  the  convergence  analysis  for  the  model 
problem,  the  governing  factors  for  the  convergence  of  fyM  are  explored.  Using  this 
knowledge,  the  performance  of  SfiM  can  be  significantly  improved.  As  a  concrete 
example  of  the  improvement  in  performance,  let’s  apply  to  the  model  problem 


&U  &U  „  , 

U|r  =g(x,y) 


(z,y)€  (0,1)  x  (0,1), 


where  the  five-point  stencil  is  used,  and  the  mesh  size  is  1/320.  In  Table  7.1  we 
summarize  the  results  from  five  different  ways  of  applying  to  this  problem.  For 
each  implementation,  we  list  the  number  of  iterations  and  the  total  relative  work 
needed  in  reducing  the  norm  of  the  error  by  a  factor  of  10s,  as  well  as  the  con¬ 
vergence  factor.  In  the  first  approach,  the  unit  square  is  divided  into  5  strips  and 
each  strip  overlaps  with  its  neighbors  only  by  one  mesh  width.  The  Jacobi  type 
of  with  natural  ordering  is  applied.  As  we  might  expect,  the  convergence  is 
very  slow.  The  second  approach  is  to  increase  the  overlap  to  the  optimum,  namely, 
each  strip  now  overlaps  with  its  neighbor  by  half  of  its  width.  The  same  Jacobi 
iteration  is  used.  The  exponential  relationship  between  the  convergence  factor  and 
the  overlap  makes  a  big  improvement  in  the  performance.  Next,  Gauss-Seidel  S^f 
is  applied.  The  convergence  speed  is  doubled.  Then  SOR  acceleration  is  incorpo¬ 
rated.  We  list  a  result  for  which  the  optimal  u  is  used.  Again  the  performance  is 
improved  further.  Finally,  a  multilevel  grid  technique,  with  five  grid  levels,  is  com¬ 
bined  with  SOR  acceleration.  The  combination  of  these  four  modifications  yields  a 
significant  improvement  in  performance.  As  we  show  in  Chapter  6,  multi-direction 
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decomposition  can  lead  to  further  improvement. 


Iteration  1 

Convergence 

Number  of 

Total  relative 

technique 

factor 

iterations 

work 

with  minimal  overlap 

0.99308 

1658 

1658 

with  optimal  overlap 

0.90097 

110 

183.33  ! 

Gauss-Seidel 

0.8117 

55 

91.67 

SOR  +  (with  optimal  «*•) 

0.395 

13 

21 

Multi-level  grid  4-  SOR  -4-  ' 

0.395 

13 

2.1 

Table  7.1:  A  comparison  of  5  different  implementations  of  SAM 


We  have  incorporated  several  acceleration  strategies  in  this  example.  An  im¬ 
portant  factor  for  practical  application  is  that  these  accelerations  do  not  interfere 
with  each  other.  The  various  freedoms  in  which  we  mentioned  in  the  intro¬ 
duction  allow  us  to  combine  many  other  techniques  to  improve  the  performance 
when  we  apply  to  a  particular  problem.  Particularly,  generalizations  of 
,  Schwarz  splittings  (■?>),  are  introduced  in  this  thesis.  Thus,  we  can  apply  this 
powerful  technique  to  many  important  applications  other  than  elliptic  PDE’s. 

There  is  an  increasing  demand  for  parallel  algorithms,  the  inherent  parallelism, 
the  local  communication  pattern  and  the  hiding  of  global  information  exchange 
make  an  attractive  candidate  for  large  scale  computations  on  a  parallel  com¬ 
puter  with  non-shared  memory.  A  generalization  of  —  multi-color  —  is 
presented.  It  preserves  the  parallelism  of  the  original  while  provides  a  fast  con¬ 
vergence.  Many  parallel  implementation  issues  such  as:  local  relaxation  strategy: 
convergence  checking;  carrying  the  exchange  of  boundary  values  at  coarser  grid  level 
even  after  the  computation  has  proceeded  to  finer  grid  level  are  discussed  in  this 
thesis.  We  also  propose  some  open  problems  which  should  be  further  investigated. 

In  Chapter  4  we  discussed  the  problems  caused  by  the  matrix  structure.  The  ab¬ 
stract  form  of  a  matrix  creates  difficulties  for  observing  many  important  features  of 
a  linear  operator.  A  new  structure  template  operator ,  which  is  more  consistent  with 
the  form  of  the  original  continuous  operator  than  the  matrix  is,  has  been  developed. 
Using  this  new  structure,  we  have  presented  the  concepts  of  influencing  and  influ¬ 
enced  wavefronts  which  provide  tools  for  quantitatively  describing  the  exponential 
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decay  phenomenon.  Several  estimates  for  the  exponential  decay  are  shown  .n  ' 
study  providing  a  theoretical  basis  for  determining  when  $S  can  be  used  success*!; 

Although  is  an  very  old  mathematical  technique,  the  understanding  >f  * 
approach  is  still  young.  Particularly,  computational  experience  is  very  limited  < 
study  has  presented  a  promising  but  preliminary  investigation  Interesting  . ,.j 
problems  remaun  to  be  solved.  We  have  seen  increasing  interest  in  this  topic  art. 
numerical  analysts,  and  expect  Ss  to  become  a  competitive  and  popular  itera* 
technique. 
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The  Eigenvalues  of  the  matrix  Wn 


where 


= 


Ac  = 


A  Aj 

A)  .4j  j4j 


At  ss 


A)  A\  Ai 
Ac  A\ 


Ai  = 


(A.l) 


and  b.  a  >  0  We  will  discuss  the  calculation  of  the  eigenvalues  and  eigenvectors  for 
this  matrix  in  this  appendix. 

Before  we  calculate  the  eigenvalues  of  this  matrix,  the  following  result  is  useful 
for  later  discussion. 


Lemma  A.l  If  b  >  a  then 

a  +  b  > |  Aw,  j>  b  -  a, 

where  W,  «  any  eigenvalue  of  mafrtz  Wn. 

Proof  The  left  half  of  the  inequality  can  be  derived  directly  from  Gershgorin’s 
'heorem  Let 

*  =  Ui.---  .zin)7* 

be  the  corresponding  eigenvector  of  Aw,  and 

zk  =  max  {!  z,  |}  >  0. 
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For  Jfc  =  2  or  A:  =  2n  —  1,  we  have 

it 

u 

=  -Wn2l, 

or 

bZtn-i  —  Aiv„^2n- 

f 

/ 

So, 

1 

1  <W„  j>  6  >  6  —  a. 

For  2  <  k  <  (2 n  —  1),  we  have1 

1 

azfc.j  +  6z*  =  Xwn2k~i- 

£ 

rr 

Then, 

5 

£  • 

z*_i  ?*_2 

Aw,  a  =6. 

**  z* 

Finally,  we  have 

1 

i  Awn  1>  &  -  a* 

& 

§ 

& 

The  eigenvalue  and  eigenvector  problem  for  Wn  is  equivalent  to  the  boundary 
value  problem  of  the  matrix  difference  equation: 

f  AfjZk  +  (.Ai  —  pI)Zk+i  +  AiZk+2  =  0)  fc  =  l,---,n,  (A  2) 

\  Zq  —  Zn+l  =0 

> 

where  p  is  an  eigenvalue  of  Wn.  It  can  be  solved  easily  by  the  nonmonic  matrix 
polynomial  theory.  Here  we  will  use  the  same  notation  in  Gohberg’s  book  [GLR82]. 
It  is  interesting  that  the  spectral  theory  of  the  general  matrix  polynomials  1(A)  is, 
surprisingly,  of  very  recent  origin. 

•«ViL 

m 

*S‘>| 


=  1  or  k  =  2n  we  have 


bzj  =  Aw.?!, 
bzjn-l  =  A(V.^Jn 
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The  nonmonic  matrix  polynomial2 

1(A)  =  A2 A2  +  (A,  -  pi) A  +  Ao 


corresponds  to  the  matrix  difference  equation  (A. 2).  The  solution  of  (A. 2)  can  be 
expressed  in  terms  of  a  decomposable  pair  of  L( A).  Let  ( Xf ,  Jf)  and  (AT^,  J0 c)  be 
a  finite  and  an  infinite  Jordan  pair  respectively.  The  decomposable  pair  of  matrix 
polynomials  L{ A)  is 

([X^XooUi,©  Joo). 

From  Theorem  8.3  in  [GLR82]  we  know  that  the  general  solution  of  the  homogeneous 
finite  difference  equation  (A. 2)  is  given  by 

Z*  =  XF  JkFg,  k  =  0, 1,  •  •  • .  (A.3) 

A  short  calculation  shows  that 

det(£(A))  =  - X(apX 2  -  (a2  +  p2  -  b2) A  +  ap ). 


Then  the  eigenvalues  of  the  L(X)  are 


Ao 

Ai 

A2 


=  0, 

-  0,2  +  p2  ~  &2  +  b  ~  ~  p2^a  +  b)~  p2} 

2  ap 

a* ^((6  _  ay  -  pa)((q  +  b)  -  p2) 
2ap 


We  know  that  Aj,  Aj  ^  0.  The  eigenvectors  of  L{ A)  corresponding  to  the  eigenvalues 
A i  and  £(A,)  are 


xq 


L(  A0) 


a  0 
0  0 


2  A  matrix  polynomial 

k 

L(A)  =  £a,V 

i  =  0 

is  said  to  be  monic  if  A*  =  /,  otherwise  it  is  called  nonmonic.  Here  Ai  are  m  x  m  matrices. 


1 


1 


U  A,) 


L(\7) 


cl  —  pX  i  6A| 

gAj^  —  pAj 

q  — •  pX'i  bX 2 

bXi  qX^  —  pX'i 


where 


pX\  -  a 


pX^  —  a 


Wl  6At  ’  bX7 

Since  the  Jordan  chains  which  correspond  to  these  eigenvalues  all  have  only  one 
eigenvector  each,  the  finite  Jordan  pair  is  as  follows: 


XF 


Jf 


0  1  1 

1  UJ\  U)7 

0  0  0  " 

0  A!  0 

0  0  A2 


Now  we  may  use  the  general  solution  and  the  boundary  value  to  determine  the 
eigenvalues  and  eigenvectors  of  matrix  Wn.  Note 


and 


AoXf  = 


0  a  a 
0  0  0 


(.4i  —  pI)XfJf  — 


0  X\{uj\b  —  p)  Aj(u i7b  —  p ) 

0  Ai(6  —  pu/ v)  A —  P^i) 
0  —a  —a 

0  — X\^<1UJ\  —X7^QU>2 


A7XfJf  — 


0  0  0 

0  X\*cujj\  Aj^au/j 


.AcXf  +  (A,  -  pI)XfJf  +  A7XfJf 7  =  0. 


Thus,  the  general  solution  (A. 3)  does  satisfy  the  matrix  difference  equation  (A. 2). 
Now  let’s  determine  the  constant  vector  g  =  {go^gi,gi)T  to  satisfy  the  boundary 
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conditions.  For  the  first  boundary  condition 


Xf9  =  0, 


we  have 


9 1  —  ~9i 


and 


a(At  —  A2) 

9o  -  n  \ - 9 1 


bX  \  A3 


From  the  second  boundary  condition 

XFJFn+'g  =  0. 

Aj  and  A2  have  to  satisfy  the  following  condition: 


:  a.4) 


|  Aj  Aj  =  1 
^  ojiAib+1  =  wjAj'1"*"1 

The  first  equality  is  satisfied  from  the  definition.  If  b  >  a,  then  At  and  A2  are 
complex,  the  second  implies  the  following  condition: 


asinnl  =  /?sin(n  +  1)0, 


(A. 5) 


where 


X\  =  e  .  ( A. 6) 

Theorem  A.l  The  eigenvalues  of  the  matrix  Wn  satisfy  the  following  equations: 

Aw„2  +  2a  cos  BXwn  +  a2  —  b2  =  0.  ( A.  7) 


The  corresponding  eigenvectors  are 

xf(9)[  jF(9),(jF(e))2,---,(M*rr  }Tg(0)> 


where  B  is  the  root  of  the  following  equation: 


119 


v«*.«v 


Proof.  Equation  (A. 7)  can  be  directly  derived  from  the  definition  of  Then  we 
may  solve  Aw„  in  (A. 7)  and  substitute  it  into  (A. 5).  After  rearranging  the  terms  of 
sin((n  +  1)0)  ■  •  d  a,  (A. 8)  follows. 


Corollary  0  If  a  <  b,  matrix  Wn  ha s  2 n  distinguish  real  eigenvalues.  When  n.  is 
increased ,  p  w  also  increased3 . 


p  =  max  {|  Aw„  |}. 


A  short  calculation  shows  that  if  n  =  1 


p  =  6, 


(A. 9) 


and  if  n  =  2 


p  =  Jb(a  +  b). 


(A. 10) 


Equations  (A. 9)  and  (A. 10)  are  true  for  any  a  and  b. 


3Here  a  and  A  are  fixed. 


Appendix  B 

Extensions  of  the  Template  Operator 

Here  we  present  a  few  extensions  of  the  template  operator  introduced  in  Chapter 
4.  First,  the  template  operator  L  over  T‘,  where  s  >  1,  is  discussed.  Then  a  more 
generalized  operator  -block  template  operator-  is  considered.  In  the  last  section, 
other  kinds  of  operations  on  the  template  vectors  using  template  operators  are 
presented. 

B.l  Template  Operator  over  T* 

In  Chapter  4,  the  simplest  case  of  the  template  operator,  L  on  template  space  TJ , 
is  examined.  A  more  general  case,  the  template  operator  on  TJ,  where  s  >  1,  is 
considered  here. 

Given  a  template  vector  space  T*,  s  >  1,  the  template  operator  space  over  this 
space  is  defined  as  follows:  let 

Tn  =<  ■  ■  ,on  > 

be  the  template  of  T‘,  M,  be  the  space  of  all  s  x  3  matrices.  Construct  n  Cartesian 
products 

Ni  =  M,  x  Oi,  i  =  1,  •  ■  • ,  n. 

Let 

M*n  =  iVi  x  iV2  x  •  •  •  x  JV„. 

Each  element  U  €  M'n  consists  of  n  ordered  pairs 

U  =  {{<  >,  <  A/2,c>2  >,•••,<  A/n,o„  >)), 


>fi» 

♦S' 


B.l.  Template  Operator  over  T‘ 
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where  M0t ,  i  =  1,  •  •  • ,  n  are  3  x  s  matrices.  After  defining  the  operations  of  addition 
and  scalar  product  on  these  elements,  it  is  easy  to  see  that  the  space  At;  is  a  linear 
space.  We  call  it  a  template  matrix  space.  Each  element  m  6  At*  is  called  a 
template  matrix.  Now  construct  n  Cartesian  products 

Q i  =  At n  x  o,,  i  —  1 ,  •  ■  • ,  n . 


& 

t.y 

% 

to 


& 


w:> 


Let 

C  =  Qi  x  Qi  x  •••  x  Qn. 

Each  element  L  €  C*  consists  of  n  ordered  pairs 

L  =  [<  RuOi  >,<  R2,  02  >,•••,<  RrxxOrx  >]/, 

or  simply 

L  =  [i?o, ,  R<> , ,  •  •  • ,  -Ho*]/* 

As  in  the  case  when  s  =  1,  a  template  operator  space  £*  over  T’  with  the 
operations  of  addition  and  scalar  product  can  be  defined. 

Let 

R=  {{<  Ruoi  >,<  R2, oi  >,-■•,<  Rrx,on  >))  €  At; 

and 

X  =  {<  Xi,oi  >,<  x2,02  >,•••,<  x„,o„  >}  €  T’. 

Define  the  product  of  R  and  x  as 

Rx  =  J2 

o,€0 

where  Ro,x0 .  is  the  product  of  an  3  x  s  matrix  Ro,  and  a  vector  x0>.  Here  the  result 
Rx  is  an  3-dimensional  vector.  Now  we  can  define  the  operation  of  a  template 
operator  L  €  C'  on  a  template  vector  x  €  TJ.  Let 


We  define 


L  —  [.Hoj  i  Ro 3  >  ’  ’  >  Ron  f" 


y  —  Lx  —  {  R0 1  r ,  R0 j  x ,  •  •  ■ ,  Ro„  x  } . 
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From  this  definition,  we  know  that  y  =  Lx  €  T’  and  L  maps  T*  to  T‘.  We  may 
also  introduce  the  concepts  of  operating  template  matrix,  image  template  matrix, 
right  form  of  the  operator  and  so  on.  Since  they  are  so  similar  to  the  case  of  3  =  1, 
we  may  leave  them  to  the  reader. 

B.2  Group  Template  Operator 

In  many  cases,  we  need  to  group  the  nodes  in  a  template  into  a  few  sets,  each 
including  several  nodes.  The  number  of  the  nodes  cam  be  very  different.  Schwarz 
splitting  is  a  good  example  of  an  application  of  this  idea.  An  alternative  view  is  that 
each  node  in  a  template  is  associated  with  a  state  vector  which  may  have  different 
dimensions  in  different  nodes.  Here  the  group  template  operator  is  introduced  from 
the  second  point  of  view. 

Given  a  template  Tn,  and  n  vector  spaces  Vj,  i  =  1,  •  •  • ,  n,  where  the  dimension 
of  Vi  is  ki,  construct  n  Cartesian  products 

Si  =  ViX  Oi ,  i  =  1,  •  •  • ,  n. 


The  group  template  vector  space  Qn  is  the  set 

Qn  =  Si  X  Sj  X  •  •  •  X  Sn 

with  two  operations  — addition  and  scalar  product.  Each  element 
X  =  (<  Ii,Oi  >,<  X2,Ol  >,•••,<  xn,on  >) 


or 


x 


{l0,  ,  Xo^ , 


‘  '  1  XOn  } 


in  Qn  is  called  a  group  template  vector.  Let  Mij  denote  the  space  of  all  k,  x  k, 
matrices.  A  group  template  matrix  space  Qi  is  the  set 


with  two  operations  -addition  an-:  ''  a*ar  pr<»luct  Each  element  R  <=  Q,  is  u 

ordered  pairs 

R=  A/, . ,  .  A/,  •*,  -  ,  A/n.o„  >)) 

where  A/0l  is  a  it,  x  matrix.  Define  'he  product  of  a  group  template  matrix  /?  and 
a  group  template  vector  i  as  follows 

Rx  -  Y1 

j,60 

The  product  is  a  vector  of  dimension  k, . 

Then  the  group  template  operator  space  Cg  is  the  set 

Cg  =  {<2l  X  °l}  *  {Ql  X  Ol}  x  ■  *  {Qn  X  On} 


with  two  operations  — addition  and  scalar  product.  Each  element  L  6  C3  is  n 
ordered  pairs 

L  as  [<  Ri,  Ol  >,  <  R2,  Oi  >,•••,<  i?n,  On  >]/, 


or  simply 


L  [Hoi  )  R<)g  )  '  ’  '  )  R'On 


where  R0 ,  is  a  group  template  matrix  in  Qt.  The  definition  of  operating  a  group 
template  operator  £  on  a  group  template  vector  x  is  the  following: 


y  -  LX  —  {  Rq  j  X,  Ro?X,  »  R"On  x  } 

where  Jlo.x,  the  product  of  the  group  template  matrix  R0,  and  x,  is  a  vector  of 
dimension  ki.  The  group  template  operator  is  a  parallel  concept  of  the  partitioned 
matrix  in  the  matrix  structure.  As  in  the  last  section,  we  leave  many  of  the  defini¬ 
tions  to  the  reader. 


B.3  A  Template  Operator  Maps  Tn  into  Tm 


So  far  the  template  operators  we  have  discussed  are  mappings  for  which  the  domain 
and  range  are  the  same  space.  Here  a  more  generalized  template  operator  which 
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maps  one  template  space  T*  into  another  is  presented.  In  order  to  simplify  the 
notation,  we  only  consider  the  simplest  case,  that  is  the  case  when  s  =  1. 

Let 

Tn  =<  Oi,02,---  ,o»  » 


Tm  —  01 )  )  ■  •  • )  om 

be  the  templates  for  Tn  and  Tm ,  respectively.  The  left  form  of  a  template  operator 
L  which  maps  Tn  into  Tm  is 

L  =  [<  Rl,Oi  >,<  Rl,  02  >,*••,<  Rm,Om  >]/ 


where  is  the  operating  template  for  node  o,-  and  is  a  template  vector  in  Tn. 
The  corresponding  right  form  of  this  operator  is 

L  =  [<  Cl,  01  >,  <  Ca,  02  >,•••,<  C„,  On  >]r 

where  C0,  —  LIa the  image  template  for  node  o„  is  a  template  vector  in  7^.  We 
also  leave  the  rest  of  the  definitions  for  this  case  to  the  reader. 
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